Student Project

RunSync: Music Meets Running Training

Team members

Summary 

This project aims to leverage bio signals, potentially including Heart Rate, Heart Rate Variation, and EDA, that can be easily collected by commercial-grade wearable devices, such as smartwatches and smart wristbands, through non-invasive methods. The final product of this project will use the collected bio signals as input for a music recommendation algorithm for medium-distance runners to help improve their running experience and, ideally, their performance.

Motivation

According to a recent report, it is estimated that 621.16 million people participate in running worldwide, which generates a market size of more than 41.8 billion dollars based on running gear sales [1]. In the United States alone, approximately 49 million people participated in some kind of running, jogging, or trail running activity [2]. In this enormous market, one prevalent behavior relates to runners' habits: listening to music while running. According to a survey published on Runner's World, 72% of runners listen to something while they run [3]. Furthermore, one of the main reasons for that pattern might be that music can mask fatigue while exercising [4].

Numerous studies have been conducted trying to correlate listening to music with sports performance across multiple performance levels [literature review]. In fact, coming up with a definitive answer about how and how much music could effectively enhance sports results is exceptionally convoluted. Nevertheless, multiple, and new evidence that music and its psychological influence can lead to improvements in sports performance in general continues to be generated. One of the most common ways to quantify such improvement is using a metric called Perceived Exertion, measured through the Borg Rating of Perceived Exertion Scale (RPE) [5]. Recent studies have identified how, for example, familiar songs positively impact both competitive and recreational athletes by distracting them from the effort and changing their perception of tiredness, reducing their RPE [6].

It is essential to acknowledge that in such a complex setting, this project does not intend to provide quantitative reasoning to defend either argument on how music can improve sports performance. Nevertheless, we aim to leverage the discussed principles as opportunities and develop a product with a satisfactory product-market fit for a noteworthy addressable market size. Therefore, to reduce the complexity of the scope, this project will focus solely on analyzing one particular aerobic activity: jogging/running [7].

The objective of this project is to harness the potential of biofeedback through HR and HRV to optimize the musical accompaniment for medium-distance runs. By tailoring music recommendations to the runner's real-time physiological indicators, we aim to enhance their motivation and overall performance, ultimately contributing to their journey towards marathon success. This research addresses the following key motivations:

  • Personalized Running Experience: Individuals pursuing marathon goals often seek ways to optimize their training experience. Personalized music recommendations based on their physiological data can create a more immersive and motivational running experience.
  • Efficiency and Effectiveness: A well-structured playlist can help runners maintain an optimal pace, manage their energy levels, and stay motivated during the run. This project aims to enhance the efficiency and effectiveness of marathon training through music.
  • Scientific Exploration: Investigating the relationship between HR, HRV, and music preference can contribute to a deeper understanding of how physiological markers interact with external stimuli like music, shedding light on the mechanisms underlying human performance enhancement.

Methods 

The research methods for this project are designed to investigate the relationship between physiological data and music recommendation for medium-distance runs, taking into account both quantitative and qualitative aspects:

Data Collection

  • Utilize biosensors like AppleWatch, Empatica 4 or EmotiBit to collect real-time physiological data from participants during medium-distance runs. These sensors will capture essential parameters such as heart rate (HR) and heart rate variability (HRV).
  • Additionally, gather supplementary information through surveys or secondary research, including data on pace, terrain, weather conditions, and self-reported mood or perceived exertion, to provide context for the physiological responses.

User Feedback and Validation

  • Conduct user studies to obtain qualitative feedback on the recommended playlists. Participants' perceptions of motivation and performance enhancement will be assessed.
  • Employ subjective measures like the Rating of Perceived Exertion to validate the effectiveness of the music recommendations.

Analyzing Data

  • Perform statistical analysis and data visualization to compare the physiological responses derived from different running experiences. This analysis will focus on how HR, HRV, and user-reported mood or exertion align with specific musical selections.
  • Identify patterns and correlations within the data to gain insights into the impact of music on physiological responses during medium-distance runs.

Playlist Generation

  • Develop algorithms for playlist generation that align music with the runner's performance goals and physiological data. This includes adjusting the order of songs in an existing playlist or creating a new playlist dynamically.
  • The playlist generation process will remain adaptable, allowing for continuous refinement based on user feedback and physiological insights.

Performance Assessment

  • Apply statistical techniques to try and assess the effectiveness of the music recommendations in enhancing motivation and performance. Evaluate the correlations between physiological indicators, music choices, and performance outcomes.
  • Considering the challenges previously mentioned, the expectation is not to have statistically significant results, but some kind of comparative baseline for further investigation.

 

Materials

Regarding hardware requirements, we anticipate using wearable devices that are either owned by the authors of this project or by the participants in the data collection experiments. We currently have an Apple Watch 4th Generation. For project enrichment and comparative analysis, the Empathica E4 might be requested, but it is not required for the project's success.

Data Sources

Regarding the primary data collection experiment, this study does not aim to generate a statistically significant data sample for the product design. Nevertheless, since both group members are active casual runners, we plan on collecting both of our data. As part of the Cal Running Club, we also plan on inviting other members to provide their running data to complement the analysis.

Additionally, in terms of external public datasets, the project might also use the raw data from the former social network Endomondo, which includes 253,020 workouts from 1,104 users [8]. This data still needs to be filtered to only running activities and adequately processed and should be used as a supporting dataset for testing and validating the recommendation algorithm.

Risk Management

Considering the short timeline of this project and the challenges of quantifying the efforts within each specific phase of this product, the team decided to define a scalable range of potential end-products:

  • [Baseline Idea] Existing Playlist Reordering: The algorithm will process and interpret the runner's data and analyze an existing playlist manually inserted in the system. The algorithm will rearrange the existing songs in a recommended order to support performance improvement, using the song's metadata, such as bpm, duration, and genre. The algorithm will obtain the metadata of the songs from an external data source [10] or a pre-loaded dictionary.
  • [Enhanced Concept] Existing Playlist Improvement: Similar to the option above, however, instead of only reordering the songs, the algorithm will identify songs not recommended to be part of that runner's playlist using metadata-based parameters and flag those songs to the user.
  • [Ambitious Product] Playlist Creation: Besides using the biodata collected, the algorithm will also use the external data source to suggest an entire playlist for the user. The playlist will be ordered, and it will be easy for the user to replicate it on their preferred streaming services.
  • [Moonshot] Integrated Playlist Creation (Spotify): Using the existing Spotify APIs [11], the notebook will not only define the songs and their orders for the runner but will also automatically create that playlist directly on Spotify. In this version of the product, more arguments could be taken into consideration for the playlist creation (favorite genre, Spotify liked songs, history, etc.) [12].

Challenges

One of the biggest challenges identified was that Apple significantly changed the structure of the HealthKit in terms of the .XML data output in 2022. Essential variables for our analysis, such as total distance and the total energy burned, had substantial changes in how they are stored, in terms of location and even in format, with the old fields deprecated. These changes made leveraging existing code to extract the data much more problematic and the learning curve more steep.

Moreover, we had a particular extra challenge by trying to run some previously vetted scripts to our data because our first export was generated with a particularly troublesome IOS 16 version [13] [14]. We had not anticipated that this type of XML formatting changes would happen in a commercially appraised product line like Apple's. In the future, this is a risk for the functionality of our solution, and designing a more robust data extraction process might be necessary for product longevity.

Another challenge is that our current system is designed to create two-minute workout intervals, during which we collect various bio-signal data such as heart rate (HR), speed, and pace. The objective is to use these metrics to inform the selection of music tracks from Spotify that have a beats per minute (BPM) rate congruent with the user's workout intensity.

The primary issue we grappled with is the development of an algorithm or a set of rules that can objectively determine the influence of the collected metrics on the choice of music. Each metric—heart rate, pace, total distance covered, and the estimated calorie burn (total burnout)—has the potential to impact the song selection process. For instance, a higher heart rate might suggest a preference for a faster BPM to match the workout's intensity, while a slower pace could indicate the need for a more relaxed tempo.

The challenge is multifaceted:

  • Quantitative to Qualitative Mapping: There is an inherent difficulty in translating quantitative data (like HR and pace) into the qualitative experience of music selection. The subjective nature of music enjoyment means that not all users will respond to tempo changes in the same way.
  • Weighting Different Metrics: We need to determine the relative importance of each metric in the song selection process. Should heart rate be the dominant factor, or should we consider pace and distance more heavily? The answer is not straightforward and requires a nuanced approach.
  • Individual Variability: Each user may have different preferences and physical responses to exercise, which means the system must be adaptable to individual needs and potentially learn from user feedback over time.

We explored various computational models and user feedback mechanisms to refine our approach and develop a robust solution that can dynamically adapt to the needs of the user.

Results

This project is the first step towards a more comprehensive product with an extensive list of potential additional features. Therefore, we included in this proposal an additional section to cover Risk Management, addressing a range of potential expected results that might not necessarily be achieved during this class but are part of the product roadmap.

Nevertheless, within the scope of the BioSensory Computing class final product, the final deliverable is a Python notebook that uses as input the Biosignals historical running data collected from the wearable device, processes the data (outliers cleaning, aggregation, metrics calculations), and generates, as output, a recommended playlist with songs in a specific order to try and maximize the users' running experience. This algorithm takes into consideration data points such as:

  • Age (manual input).
  • Actual Heart Rate per interval (time or distance)
  • Actual Heart Rate zone (using Karvonen’s formula)
  • Target Running Pace per interval (time or distance)
  • Target Heart Rate Zone per age [9]

Here is a sample playlist that we generated from one of the running workouts: https://open.spotify.com/playlist/3AI3A2vA2FJ0zLmkQQpjqB

 

Model design

The designed model consists of multiple interactions, from extracting, parsing and processing the data available via the Apple Health xml file, all the way to the generation of the list of songs that will go to the playlist using the streaming service Spotify. The following image describes briefly the flow of the model, followed by a more extensive explanation on how each node actually operates and what are the parameters adopted.

                                                                         Figure 1: Flowchart of the BioSync model

 

The current version of the model is initialized with two main user inputs: 

  1. The Apple Health Data Export file, generated through the Apple Health app and downloadable as an xml file.
  2. The manual input of the user's age, used to calculate the Maximum Heart Rate using the generic estimation formula: 

Maximum Heart Rate = 220 - age [13].

With the above information, the model calculates the Heart Rate Zones using Karvonen's method, which calculates the target heart rate through the following formula: 

Target HR = ((Max HR − Rest HR) × Intensity %) + Rest HR

The Max HR was estimated using the formula provided on sub-item b) and the Rest HR is available in the Apple XML file, as presented in the Jupyter notebook.

Since the Apple Watch contains a sampling rate of 0.2 Hz, one new measurement record is generated approximately every 5 seconds. Therefore, as an aggregation approach, this model uses 3-minute intervals and calculates the mean of the heart rate as an aggregated metric. The selection of 3 minutes as the duration of the intervals was defined considering the average length of a song in 2023, which is set to be approximately 3 minutes and 15 seconds according to Thrasher from the Improve Songwriting portal [14]

As reference for the heart zone boundaries, this paper relies on common and validated thresholds for each zone, widely accepted both by scholars and industry experts. The following table displays these thresholds:

 

Heart Rate Zones

Heart Rate Range (Intensity %)

Zone 1

50% - 60%

Zone 2

60% - 70%

Zone 3

70% - 80%

Zone 4

80% - 90%

Zone 5

90% - 100%

Table 1: Five heart rate zones based on the intensity of training

With the aggregated data, the model identifies each interval's respective heart rate zones and determines the variation in relation to our target heart rate zone, which, in this case, will be Zone 2. The heart rate zones are calculated according to specific percentual thresholds of one’s actual heart rate, using as references the Resting Heart Rate and the Maximum Heart rate.  The recommendation towards Zone 2 is based on an extensive discussion surrounding the ideal heart rate zones for each type of exercise. In the case of medium/long runs, it is widely accepted that Zone 2 is a very beneficial target zone, on which the average runner should spend approximately 80% of their time, while training [15] [16] [17][18]. The following image illustrates the heart rate in bpm over time during a 10k run. The bands in the chart highlight the heart rate zones.

                                                                                  Figure 2: Heart rate in bpm over time

       

The above image illustrates the heart rate in bpm over time during a 10k run. The bands in the chart highlight the heart rate zones. Depending on the actual heart rate zone, in comparison with the target zone identified, the model outputs one specific value for each parameter used in the Spotify GET recommendation API. These parameters are:

  1. Valence: Metric adopted by Spotify to gauge whether a song is likely to make someone feel happy (higher valence) or sad (lower valence). It varies from 0.0 to 1.0.
  2. Popularity: The Spotify Popularity Index is a score to rank how popular an artist or a song is relative to other artists or songs on Spotify. It varies from 0 to 100.
  3. Tempo: Song tempo measures the speed at which a song is played and is counted in beats per minute (bpm).
  4. Genre: The genre labeling in Spotify contains numerous possibilities. For the scope of this product, we decided to fix this as the most popular genre. According to Zipdo, in 2020, pop music was responsible for more than 33% of Spotify streams, making it the most popular genre on the platform [18]. 

This study adopts as basic premises that by adjusting these variables, the user might get some indirect queues or influence towards the ideal heart rate zone. For example, suppose a user is undergoing a very high heart rate period. In that case, the recommended songs will have a lower tempo, valence, and even lower popularity as an attempt to help reduce the user's heart rate. 

For each of those variables, there is one associated value that is fed into Spotify's API. The following table explains how they are adjusted based on the heart rate zones and the previous values for each variable. 

HR Zones
Tempo 
Adjustment (%)
Valence
Popularity
Genre

Zone 1

5%

1

10

Pop

Zone 2

2.50%

1

10

Pop

Zone 3

-2.50%

0.9

9

Pop

Zone 4

-5%

0.8

8

Pop

Zone 4

-7.5%

0.7

7

Pop

Others

No change

It is essential to acknowledge that fine tuning these variations is part of the future work associated with this project. These variations were selected somewhat arbitrarily and adjusted based on peer evaluation of the project authors. Deeper analysis and studies should be conducted for a more accurate suggestion.

Finally, with the recommendations generated, the model consolidates the recommended songs into a list object, used to create a publicly available playlist directly on Spotify.

Explore our Code

Dive into the source code and documentation of our project on GitHub: RunSync Project Repository

Sources

[1] "Listening to Preferred Music Improved Running Performance without Changing the Pacing Pattern during a 6 Minute Run Test with Young Male Adults," accessed from source.

[2] "Runner’s World Running Survey 2023: Results”. Accessed from source.

[3] "The Effect of Music Listening on Running Performance and Rating of Perceived Exertion of College Students," accessed from source.

[4] "Running to music combats mental fatigue, study suggests," accessed from source.

[5] "CDC - Measuring Physical Activity Intensity," accessed from source.

[6] "Perceived Exertion Scale - CDC," accessed from source.

[7] "Build Your Own Playlist Generator with Spotify's API in Python," accessed from source.

[8] "Fitrec Project - Google Sites," accessed from source.

[9] "Average Heart Rate While Running - Marathon Handbook," accessed from source.

[10] "GetSongBPM API," accessed from source.

[11] "Spotify Web API Reference - Create a Playlist," accessed from source.

[12] “Build your own Playlist Generator with Spotify’s API in Python” accessed from source

[13] Centers for Disease Control and Prevention. Measuring Physical Activity Intensity. Accessed from source.

[14] Improve Songwriting. (2023). How Long Should a Song Be? Accessed from source. 

[15] Marathon Handbook. (n.d.). Zone 2 Training: The Science & Benefits. Accessed from source.

[16] 220 Triathlon. (n.d.). Beginner’s Guide to Zone 2 Running. Accessed from source.

[17] WHOOP. (n.d.). Why Zone 2 Training is the Secret to Unlocking Peak Performance. Accessed from source.

[18] Zipdo. (n.d.). Spotify Genre Statistics. Accessed from source.

Ballmann, Christopher G. “The Influence of Music Preference on Exercise Responses and Performance: A Review.” Journal of functional morphology and kinesiology vol. 6,2 33. 8 Apr. 2021, doi:10.3390/jfmk6020033

Jianmo Ni, Larry Muhlstein, Julian McAuley, "Modeling heart rate and activity data for personalized fitness recommendation", in Proc. of the 2019 World Wide Web Conference (WWW'19), San Francisco, US, May. 2019.

Additional material

https://runningmagazine.ca/sections/training/heart-rate-and-pace-how-to-compare-these-metrics-to-improve-performance/

 

 

Last updated: December 12, 2023