Tracking Data API

Contents

Tracking Data API#

TrackingData#

databallpy.schemas.TrackingData

class TrackingData(*args, provider: str = 'unspecified', frame_rate: int | float = -999, **kwargs)[source]

Bases: DataFrame

This is the tracking data class. It contains the tracking data for every frame as well as the provider and frame_rate. Additionaly it contains some basic functions to add columns to the tracking data or manipulate existing columns

Parameters:
  • tracking_data (pd.DataFrame) – tracking data of the game

  • provider (str) – provider of the tracking data

  • frame_rate (int) – framerate of the tracking data

add_acceleration(column_ids: str | list[str], filter_type: str | None = None, window_length: int = 25, polyorder: int = 2, max_acceleration: float = inf, allow_overwrite: bool = False) None[source]

Function that adds acceleration columns to the tracking data based on the position columns.

Parameters:
  • column_ids (str | list[str]) – Columns for which acceleration should be calculated.

  • filter_type (str, optional) – Filter type to use. Defaults to None. Options are moving_average and savitzky_golay.

  • window_length (int, optional) – Window size for the filter. Defaults to 25.

  • polyorder (int, optional) – Polynomial order for the filter. Defaults to 2.

  • max_acceleration (float, optional) – Maximum value for the acceleration. Defaults to np.inf.

  • allow_overwrite (bool) – Whether or not it is allowed to overwrite existing values.

Returns:

None

Raises:
  • ValueError – If filter_type is not one of moving_average, savitzky_golay, or None.

  • ValueError – If velocity was not found in the DataFrame for the input_columns.

Note

If “_acceleration” exists, but “_ax” and “_ay” do not, and allow_overwrite is False, “_ax” and “_ay” will be computed and added, but “_acceleration” is kept unchanged. Therefore, it may not correspond with the other values.

The function will delete acceleration columns if they already exist.

add_dangerous_accessible_space(mask: Series = None, **kwargs) None | DataFrame[source]

Function to add a column ‘dangerous_accessible_space’ to the tracking data, indicating the accessible space weighted by the expected value (measured by xG) of the respective location.

Warning: Can be expensive, only use for frames that are needed.

SOURCE: Jonas Bischofberger, Arnold Baca. Dangerous Accessible Space: A Unified Model of Space and Value in Team Sports, 21 August 2025, PREPRINT (Version 1) available at Research Square [https://doi.org/10.21203/rs.3.rs-6932689/v1]

Parameters:

mask (Series) – Boolean filter to calculate fewer values.

Returns:

None

add_individual_player_possession(pz_radius: float = 1.5, bv_threshold: float = 5.0, ba_threshold: float = 10.0, min_frames_pz: int = 0) None[source]

Function to calculate the individual player possession based on the tracking data. The method uses the methodology of the paper of Vidal-Codina et al. (2022): “Automatic Event Detection in Football Using Tracking Data”.

Parameters:
  • self.

  • pz_radius (float, optional) – The radius of the possession zone constant. Defaults to 1.5.

  • bv_threshold (float, optional) – The ball velocity threshold in m/s. Defaults to 5.0.

  • ba_threshold (float, optional) – The ball angle threshold in degrees. Defaults to 10.0.

  • min_frames_pz (int, optional) – The minimum number of frames that the ball has to be in the possession zone to be considered as a possession. Defaults to 0.

Returns:

None

add_team_possession(event_data: DataFrame, home_team_id: int, allow_overwrite: bool = False) None | DataFrame[source]

Function to add a column ‘team_possession’ to the tracking data, indicating which team has possession of the ball at each frame, either ‘home’ or ‘away’.

Raises:
  • ValueError – If the tracking and event data are not synchronised.

  • ValueError – If the home_team_id is not in the event data.

Parameters:
  • self

  • event_data (EventData) – Event data for a game

  • home_team_id (int) – The ID of the home team.

  • allow_overwrite (bool, optional) – If “team_possession” column has non null values, allow_overwrite should be set to true before the function is executed. Defaults to False.

Returns:

None

add_velocity(column_ids: str | list[str], filter_type: str | None = None, window_length: int = 7, polyorder: int = 2, max_velocity: float = inf, allow_overwrite: bool = False) None[source]
Function that adds velocity columns to the tracking data based on the position

columns

Parameters:
  • self

  • column_ids (str | list[str]) – columns for which velocity should be calculated.

  • filter_type (str, optional) – filter type to use. Defaults to None. Options are moving_average and savitzky_golay.

  • window_length (int, optional) – window size for the filter. Defaults to 7.

  • polyorder (int, optional) – polynomial order for the filter. Defaults to 2.

  • max_velocity (float, optional) – maximum value for the velocity. Defaults to np.inf.

  • allow_overwrite (bool) – Whether or not it is allowed to overwrite existing values Note: if “_velocity” exists, but “_vx” and “_vy” not, and allow_overwrite is set to False, “_vx” and “_vy” will be computed and added, but “_velocity” is kept the same, and therefore does not correspond with the other values. Defaults to False.

Returns:

None

Raises:

ValueError – if filter_type is not one of moving_average, savitzky_golay, or None.

Note

The function will delete the columns in input_columns with the velocity if they already exist.

filter_tracking_data(column_ids: str | list[str], filter_type: str = 'savitzky_golay', window_length: int = 7, polyorder: int = 2) None[source]

Function to filter tracking data in specified DataFrame columns.

Parameters:
  • self.

  • column_ids (str| list[str]) – List of column IDs to apply the filter to.

  • filter_type (str, optional) – Type of filter to use. Defaults to “savitzky_golay”. Options: {“moving_average”, “savitzky_golay”}.

  • window_length (int, optional) – Window length of the filter. Defaults to 7.

  • polyorder (int, optional) – Polyorder to use when the savitzky_golay filter is selected. Defaults to 2.

Returns:

None

get_approximate_voronoi(pitch_dimensions: list[float, float], n_x_bins: int = 106, n_y_bins: int = 68, start_idx: int | None = None, end_idx: int | None = None) tuple[ndarray, ndarray][source]

Find the nearest player to each cell center in a grid of cells covering the pitch.

Parameters:
  • self.

  • pitch_dimensions (list[float, float]) – The dimensions of the pitch.

  • n_x_bins (int, optional) – The number of cells in the width (x) direction. Defaults to 106.

  • n_y_bins (int, optional) – The number of cells in the height (y) direction. Defaults to 68.

  • start_idx (int, optional) – The starting index of the period. Defaults to None.

  • end_idx (int, optional) – The ending index of the period. Defaults to None.

Returns:

The distances to the nearest player for each

cell center and the column ids of the nearest player. If tracking_data is a pd.Series, the shape will be (n_y_bins x n_x_bins), otherwise (len(tracking_data) x n_y_bins x n_x_bins).

Return type:

tuple[np.ndarray, np.ndarray]

get_covered_distance(column_ids: list[str], velocity_intervals: tuple[float, ...] | tuple[tuple[float, ...], ...] = (), acceleration_intervals: tuple[float, ...] | tuple[tuple[float, ...], ...] = (), start_idx: int | None = None, end_idx: int | None = None) DataFrame[source]
Calculates the distance covered based on the velocity magnitude at each frame.

This function requires the add_velocity function to be called. Optionally, it can also calculate the distance covered within specified velocity and/or acceleration intervals.

Parameters:
  • self.

  • column_ids (list[str]) – columns for which covered distance should be calculated

  • velocity_intervals (optional) – tuple that contains the velocity interval(s). Defaults to ()

  • acceleration_intervals (optional) – tuple that contains the acceleration interval(s). Defaults to ()

  • start_idx (int, optional) – start index of the tracking data. Defaults to None.

  • end_idx (int, optional) – end index of the tracking data. Defaults to None

Returns:

DataFrame with the covered distance for each player. The columns are the player_ids and the rows are the covered distance for each player. If velocity_intervals or acceleration_intervals are provided, the columns will be the player_ids and the intervals. The rows will be the covered distance for each player within the specified intervals.

Return type:

pd.DataFrame

Notes

The function requires the velocity for every player calculated with the add_velocity function. The acceleration for every player depends on the presence of acceleration intervals in the input

get_pitch_control(pitch_dimensions: list[float, float], n_x_bins: int = 106, n_y_bins: int = 68, start_idx: int | None = None, end_idx: int | None = None) ndarray[source]

Calculate the pitch control surface for a given period of time. The pitch control surface is the sum of the team influences of the two teams. The team influence is the sum of the individual player influences of the team. The player influence is calculated using the statistical technique presented in the article “Wide Open Spaces” by Fernandez & Born (2018). It incorporates the position, velocity, and distance to the ball of a given player to determine the influence degree at each location on the field. The bivariate normal distribution is utilized to model the player’s influence, and the result is normalized to obtain values within a [0, 1] range. The values are then passed through a sigmoid function to obtain the pitch control values within a [0, 1] range. Values near 1 indicate high pitch control by the home team, while values near 0 indicate high pitch control by the away team.

Parameters:
  • self.

  • pitch_dimensions (list[float, float]) – The dimensions of the pitch.

  • n_x_bins (int, optional) – The number of cells in the width (x) direction. Defaults to 106.

  • n_y_bins (int, optional) – The number of cells in the height (y) direction. Defaults to 68.

  • start_idx (int, optional) – The starting index of the period. Defaults to None.

  • end_idx (int, optional) – The ending index of the period. Defaults to None.

Returns:

3d pitch control values across the grid.

Size is (len(tracking_data), grid[0].shape[0], grid[0].shape[1]).

Return type:

np.ndarray

get_pressure_on_player(index: int, column_id: str, pitch_size: list[float, float], d_front: str | float = 'variable', d_back: float = 3.0, q: float = 1.75) array[source]

Function to calculate the pressure in accordance with “Visual Analysis of Pressure in Soccer”, Adrienko et al (2016). In short: pressure is determined as the sum of pressure of all opponents, which is a function of the angle and the distance to the player. This function calculates the pressure for a single player.

Parameters:
  • self.

  • index – int, index of the frame for which to analyse pressure.

  • column_id – str, column name of which player to analyse.

  • pitch_size – list, length and width of the pitch.

  • d_front – numeric or str, distance in meters of the front of the pressure oval if “variable”: d_front will be variable based on the location on the field from the article of Mat Herold et al (2022).

  • d_back – float, dinstance in meters of the back of the pressure oval.

  • q – float, quotient of how fast pressure should increase/decrease as distance. to the player changes.

Returns:

pressure on player of the specified frame.

Return type:

np.array

to_long_format() DataFrame[source]

Function that moves from the base format, with a row for every frame, to a long format, with a row for every frame/column_id combination

The ball/team information will be added to every row

returns: pd.DataFrame