.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gallery/lesson7/plot_xG_tracking.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gallery_lesson7_plot_xG_tracking.py: Expected Goals including player positions ========================================= In this lesson, we go step-by-step through the process of making expected goals model with additional information concerning opposition player location. This tutorial follows similar design choices as Javier Fernandez's expected goals model in `A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions `_. We will train a shallow neural network with following features - ball location (x) - binary variable signifying if ball was closer to the goal than the opponent's goalkeeper - angle between the ball and the goal - distance between the ball and the goal - distance between the ball and the goalkeeper in y-axis - distance between the ball and the goalkeeper - number of opponent players inside the triangle formed between the ball location and opponent's goal posts - number of opponent players less than 3 meters away from the ball location - binary variable signifying if shot was a header - expected goals based on distance to goal and angle between the ball and the goal .. GENERATED FROM PYTHON SOURCE LINES 23-46 .. code-block:: default #importing necessary libraries from mplsoccer import Sbopen import pandas as pd import numpy as np import warnings import statsmodels.api as sm import statsmodels.formula.api as smf import matplotlib.pyplot as plt import os import random as rn import tensorflow as tf #warnings not visible on the course webpage pd.options.mode.chained_assignment = None warnings.filterwarnings('ignore') #setting random seeds so that the results are reproducible on the webpage os.environ['PYTHONHASHSEED'] = '0' os.environ['CUDA_VISIBLE_DEVICES'] = '' np.random.seed(1) rn.seed(1) tf.random.set_seed(1) os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' .. GENERATED FROM PYTHON SOURCE LINES 47-52 Opening data ---------------------------- For this task we will use Statsbomb Indian Super League 2021/2022 data since it is the only dataset openly available that contains both event and tracking data for the entire season. We open each game and store data for the entire season in dataframes *shot_df* and *track_df*. Also, we change yards to meters. In the end we filter open play shots and remove shots without a goalkeeper tracked. .. GENERATED FROM PYTHON SOURCE LINES 52-87 .. code-block:: default parser = Sbopen() #get list of games during Indian Super League season df_match = parser.match(competition_id=1238, season_id=108) matches = df_match.match_id.unique() shot_df = pd.DataFrame() track_df = pd.DataFrame() #store data in one dataframe for match in matches: #open events df_event = parser.event(match)[0] #open 360 data df_track = parser.event(match)[2] #get shots shots = df_event.loc[df_event["type_name"] == "Shot"] shots.x = shots.x.apply(lambda cell: cell*105/120) shots.y = shots.y.apply(lambda cell: cell*68/80) df_track.x = df_track.x.apply(lambda cell: cell*105/120) df_track.y = df_track.y.apply(lambda cell: cell*68/80) #append event and trackings to a dataframe shot_df = pd.concat([shot_df, shots], ignore_index = True) track_df = pd.concat([track_df, df_track], ignore_index = True) #reset indicies shot_df.reset_index(drop=True, inplace=True) track_df.reset_index(drop=True, inplace=True) #filter out non open-play shots shot_df = shot_df.loc[shot_df["sub_type_name"] == "Open Play"] #filter out shots where goalkeeper was not tracked gks_tracked = track_df.loc[track_df["teammate"] == False].loc[track_df["position_name"] == "Goalkeeper"]['id'].unique() shot_df = shot_df.loc[shot_df["id"].isin(gks_tracked)] .. GENERATED FROM PYTHON SOURCE LINES 88-92 Feature engineering ---------------------------- In this section we will create features as described before. They will be stored in *model_vars* dataframe. We suggest reading the code comments to understand this part of tutorial better. .. GENERATED FROM PYTHON SOURCE LINES 92-213 .. code-block:: default #take important variables from shot dataframe model_vars = shot_df[["id", "index", "x", "y"]] #get the dependent variable model_vars["goal"] = shot_df.outcome_name.apply(lambda cell: 1 if cell == "Goal" else 0) #change the dependent variable to object for basic xG modelling model_vars["goal_smf"] = model_vars["goal"].astype(object) # ball location (x) model_vars['x0'] = model_vars.x # x to calculate angle and distance model_vars["x"] = model_vars.x.apply(lambda cell: 105-cell) # c to calculate angle and distance between ball and the goal as in Lesson 2 model_vars["c"] = model_vars.y.apply(lambda cell: abs(34-cell)) #calculating angle and distance as in Lesson 2 model_vars["angle"] = np.where(np.arctan(7.32 * model_vars["x"] / (model_vars["x"]**2 + model_vars["c"]**2 - (7.32/2)**2)) >= 0, np.arctan(7.32 * model_vars["x"] /(model_vars["x"]**2 + model_vars["c"]**2 - (7.32/2)**2)), np.arctan(7.32 * model_vars["x"] /(model_vars["x"]**2 + model_vars["c"]**2 - (7.32/2)**2)) + np.pi)*180/np.pi model_vars["distance"] = np.sqrt(model_vars["x"]**2 + model_vars["c"]**2) #calculating basic xG using logistic regression def params(df): test_model = smf.glm(formula="goal_smf ~ angle + distance", data=df, family=sm.families.Binomial()).fit() #print summary return test_model.params def calculate_xG(sh, b): bsum=b[0] for i,v in enumerate(["angle", "distance"]): bsum=bsum+b[i+1]*sh[v] xG = 1/(1+np.exp(bsum)) return xG #expected goals based on distance to goal and angle between the ball and the goal b = params(model_vars) model_vars["xg_basic"]= model_vars.apply(calculate_xG, b = b, axis=1) #ball_goalkeeper distance def dist_to_gk(test_shot, track_df): #get id of the shot to search for tracking data using this index test_shot_id = test_shot["id"] #check goalkeeper position gk_pos = track_df.loc[track_df["id"] == test_shot_id].loc[track_df["teammate"] == False].loc[track_df["position_name"] == "Goalkeeper"][["x", "y"]] #calculate distance from event to goalkeeper position dist = np.sqrt((test_shot["x"] - gk_pos["x"])**2 + (test_shot["y"] - gk_pos["y"])**2) return dist.iloc[0] #store distance from event to goalkeeper position in a dataframe model_vars["gk_distance"] = shot_df.apply(dist_to_gk, track_df = track_df, axis = 1) #ball goalkeeper y axis def y_to_gk(test_shot, track_df): #get id of the shot to search for tracking data using this index test_shot_id = test_shot["id"] #calculate distance from event to goalkeeper position gk_pos = track_df.loc[track_df["id"] == test_shot_id].loc[track_df["teammate"] == False].loc[track_df["position_name"] == "Goalkeeper"][["y"]] #calculate distance from event to goalkeeper position in y axis dist = abs(test_shot["y"] - gk_pos["y"]) return dist.iloc[0] #store distance in y axis from event to goalkeeper position in a dataframe model_vars["gk_distance_y"] = shot_df.apply(y_to_gk, track_df = track_df, axis = 1) #number of players less than 3 meters away from the ball def three_meters_away(test_shot, track_df): #get id of the shot to search for tracking data using this index test_shot_id = test_shot["id"] #get all opposition's player location player_position = track_df.loc[track_df["id"] == test_shot_id].loc[track_df["teammate"] == False][["x", "y"]] #calculate their distance to the ball dist = np.sqrt((test_shot["x"] - player_position["x"])**2 + (test_shot["y"] - player_position["y"])**2) #return how many are closer to the ball than 3 meters return len(dist[dist<3]) #store number of opposition's players closer than 3 meters in a dataframe model_vars["close_players"] = shot_df.apply(three_meters_away, track_df = track_df, axis = 1) #number of players inside a triangle def players_in_triangle(test_shot, track_df): #get id of the shot to search for tracking data using this index test_shot_id = test_shot["id"] #get all opposition's player location player_position = track_df.loc[track_df["id"] == test_shot_id].loc[track_df["teammate"] == False][["x", "y"]] #checking if point inside a triangle x1 = 105 y1 = 34 - 7.32/2 x2 = 105 y2 = 34 + 7.32/2 x3 = test_shot["x"] y3 = test_shot["y"] xp = player_position["x"] yp = player_position["y"] c1 = (x2-x1)*(yp-y1)-(y2-y1)*(xp-x1) c2 = (x3-x2)*(yp-y2)-(y3-y2)*(xp-x2) c3 = (x1-x3)*(yp-y3)-(y1-y3)*(xp-x3) #get number of players inside a triangle return len(player_position.loc[((c1<0) & (c2<0) & (c3<0)) | ((c1>0) & (c2>0) & (c3>0))]) #store number of opposition's players inside a triangle in a dataframe model_vars["triangle"] = shot_df.apply(players_in_triangle, track_df = track_df, axis = 1) #goalkeeper distance to goal def gk_dist_to_goal(test_shot, track_df): #get id of the shot to search for tracking data using this index test_shot_id = test_shot["id"] #get goalkeeper position gk_pos = track_df.loc[track_df["id"] == test_shot_id].loc[track_df["teammate"] == False].loc[track_df["position_name"] == "Goalkeeper"][["x", "y"]] #calculate their distance to goal dist = np.sqrt((105 -gk_pos["x"])**2 + (34 - gk_pos["y"])**2) return dist.iloc[0] #store opposition's goalkeeper distance to goal in a dataframe model_vars["gk_dist_to_goal"] = shot_df.apply(gk_dist_to_goal, track_df = track_df, axis = 1) #create binary varibale 1 if ball is closer to the goal than goalkeeper model_vars["is_closer"] = np.where(model_vars["gk_dist_to_goal"] > model_vars["distance"], 1, 0) #create binary variable 1 if header model_vars["header"] = shot_df.body_part_name.apply(lambda cell: 1 if cell == "Head" else 0) #store dependent variable in a numpy array y = model_vars["goal"].values #store independent variables in a numpy array X = model_vars[["x0", "is_closer", "angle", "distance", "gk_distance", "gk_distance_y", "triangle", "close_players", "header", "xg_basic"]].values .. GENERATED FROM PYTHON SOURCE LINES 214-221 Training neural network ---------------------------- With the features created we can now train a neural network. We split the data 60% training, 20% validation and 20% test. Then, we scale inputs. As the next step, we create a neural network model. It follows similar design choices as Javier Fernandez's one. 2 dense layers sized 10 followed by a ReLU activation and a final layer size 1 with sigmoid activation to compute the probabilities. Our model optimizes the Brier score using Adam optimizer with learning rate 0.001 default betas. We use as suggested early stopping with minimum delta 1e-5 and batch size 16. However, we also use patience equal to 50 not to stop the first time when the validation loss is not changing. .. GENERATED FROM PYTHON SOURCE LINES 221-277 .. code-block:: default #import machine learning libraries from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from tensorflow.keras import Sequential from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import Adam from tensorflow.keras.callbacks import EarlyStopping #spllit the data to train, validation and test X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.6, random_state = 123, stratify = y) X_cal, X_val, y_cal, y_val = train_test_split(X_test, y_test, train_size = 0.5, random_state = 123, stratify = y_test) #scale data scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_val = scaler.transform(X_val) X_cal = scaler.transform(X_cal) #creating a function with a model architecture def create_model(): model = Sequential([ Dense(10, activation='relu'), Dense(10, activation='relu'), Dense(1, activation = 'sigmoid'), ]) opt = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999) model.compile(optimizer=opt, loss="mean_squared_error" , metrics=['accuracy']) return model #create model model = create_model() #create an early stopping object callback = EarlyStopping(min_delta=1e-5, patience = 50, mode = "min", monitor = "val_loss", restore_best_weights=True) #fit the model history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=1000, verbose=1, batch_size=16, callbacks = [callback]) fig, axs = plt.subplots(2, figsize=(10,12)) #plot training history - accuracy axs[0].plot(history.history['accuracy'], label='train') axs[0].plot(history.history['val_accuracy'], label='validation') axs[0].set_title("Accuracy at each epoch") axs[0].set_xlabel("Epoch") axs[0].set_ylabel("Accuracy") axs[0].legend() #plot training history - loss function axs[1].plot(history.history['loss'], label='train') axs[1].plot(history.history['val_loss'], label='validation') axs[1].legend() axs[1].set_title("Loss at each epoch") axs[1].set_xlabel("Epoch") axs[1].set_ylabel("MSE") plt.show() .. image-sg:: /gallery/lesson7/images/sphx_glr_plot_xG_tracking_001.png :alt: Accuracy at each epoch, Loss at each epoch :srcset: /gallery/lesson7/images/sphx_glr_plot_xG_tracking_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Epoch 1/1000 1/108 [..............................] - ETA: 24s - loss: 0.2296 - accuracy: 0.6250 104/108 [===========================>..] - ETA: 0s - loss: 0.1468 - accuracy: 0.8858  108/108 [==============================] - 0s 1ms/step - loss: 0.1457 - accuracy: 0.8864 - val_loss: 0.1071 - val_accuracy: 0.9009 Epoch 2/1000 1/108 [..............................] - ETA: 0s - loss: 0.0699 - accuracy: 0.9375 108/108 [==============================] - 0s 665us/step - loss: 0.0967 - accuracy: 0.8980 - val_loss: 0.0894 - val_accuracy: 0.9009 Epoch 3/1000 1/108 [..............................] - ETA: 0s - loss: 0.0502 - accuracy: 0.9375 107/108 [============================>.] - ETA: 0s - loss: 0.0889 - accuracy: 0.8984 108/108 [==============================] - 0s 682us/step - loss: 0.0893 - accuracy: 0.8980 - val_loss: 0.0847 - val_accuracy: 0.9026 Epoch 4/1000 1/108 [..............................] - ETA: 0s - loss: 0.1943 - accuracy: 0.8125 108/108 [==============================] - 0s 659us/step - loss: 0.0864 - accuracy: 0.8986 - val_loss: 0.0817 - val_accuracy: 0.9026 Epoch 5/1000 1/108 [..............................] - ETA: 0s - loss: 0.1965 - accuracy: 0.7500 108/108 [==============================] - 0s 658us/step - loss: 0.0847 - accuracy: 0.8986 - val_loss: 0.0802 - val_accuracy: 0.9043 Epoch 6/1000 1/108 [..............................] - ETA: 0s - loss: 0.0187 - accuracy: 1.0000 108/108 [==============================] - 0s 662us/step - loss: 0.0840 - accuracy: 0.8991 - val_loss: 0.0793 - val_accuracy: 0.9043 Epoch 7/1000 1/108 [..............................] - ETA: 0s - loss: 0.0597 - accuracy: 0.9375 108/108 [==============================] - 0s 672us/step - loss: 0.0835 - accuracy: 0.8991 - val_loss: 0.0788 - val_accuracy: 0.9043 Epoch 8/1000 1/108 [..............................] - ETA: 0s - loss: 0.0266 - accuracy: 0.9375 108/108 [==============================] - ETA: 0s - loss: 0.0830 - accuracy: 0.8980 108/108 [==============================] - 0s 679us/step - loss: 0.0830 - accuracy: 0.8980 - val_loss: 0.0781 - val_accuracy: 0.9096 Epoch 9/1000 1/108 [..............................] - ETA: 0s - loss: 0.0208 - accuracy: 1.0000 108/108 [==============================] - 0s 635us/step - loss: 0.0825 - accuracy: 0.9009 - val_loss: 0.0771 - val_accuracy: 0.9113 Epoch 10/1000 1/108 [..............................] - ETA: 0s - loss: 0.0302 - accuracy: 0.9375 108/108 [==============================] - 0s 647us/step - loss: 0.0823 - accuracy: 0.9003 - val_loss: 0.0767 - val_accuracy: 0.9113 Epoch 11/1000 1/108 [..............................] - ETA: 0s - loss: 0.0269 - accuracy: 0.9375 108/108 [==============================] - ETA: 0s - loss: 0.0821 - accuracy: 0.9003 108/108 [==============================] - 0s 659us/step - loss: 0.0821 - accuracy: 0.9003 - val_loss: 0.0757 - val_accuracy: 0.9113 Epoch 12/1000 1/108 [..............................] - ETA: 0s - loss: 0.2341 - accuracy: 0.6875 108/108 [==============================] - 0s 642us/step - loss: 0.0819 - accuracy: 0.9026 - val_loss: 0.0759 - val_accuracy: 0.9130 Epoch 13/1000 1/108 [..............................] - ETA: 0s - loss: 0.0247 - accuracy: 1.0000 108/108 [==============================] - 0s 640us/step - loss: 0.0816 - accuracy: 0.9020 - val_loss: 0.0757 - val_accuracy: 0.9113 Epoch 14/1000 1/108 [..............................] - ETA: 0s - loss: 0.0237 - accuracy: 1.0000 108/108 [==============================] - 0s 664us/step - loss: 0.0816 - accuracy: 0.9020 - val_loss: 0.0753 - val_accuracy: 0.9130 Epoch 15/1000 1/108 [..............................] - ETA: 0s - loss: 0.0691 - accuracy: 0.9375 108/108 [==============================] - 0s 677us/step - loss: 0.0815 - accuracy: 0.9009 - val_loss: 0.0753 - val_accuracy: 0.9148 Epoch 16/1000 1/108 [..............................] - ETA: 0s - loss: 0.0591 - accuracy: 0.9375 108/108 [==============================] - 0s 675us/step - loss: 0.0815 - accuracy: 0.9009 - val_loss: 0.0760 - val_accuracy: 0.9096 Epoch 17/1000 1/108 [..............................] - ETA: 0s - loss: 0.0525 - accuracy: 0.9375 108/108 [==============================] - 0s 660us/step - loss: 0.0814 - accuracy: 0.9026 - val_loss: 0.0755 - val_accuracy: 0.9130 Epoch 18/1000 1/108 [..............................] - ETA: 0s - loss: 0.1084 - accuracy: 0.8750 108/108 [==============================] - 0s 639us/step - loss: 0.0813 - accuracy: 0.9026 - val_loss: 0.0755 - val_accuracy: 0.9113 Epoch 19/1000 1/108 [..............................] - ETA: 0s - loss: 0.0517 - accuracy: 0.9375 108/108 [==============================] - 0s 642us/step - loss: 0.0811 - accuracy: 0.9032 - val_loss: 0.0757 - val_accuracy: 0.9130 Epoch 20/1000 1/108 [..............................] - ETA: 0s - loss: 0.2246 - accuracy: 0.7500 108/108 [==============================] - 0s 648us/step - loss: 0.0813 - accuracy: 0.9026 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 21/1000 1/108 [..............................] - ETA: 0s - loss: 0.0811 - accuracy: 0.8750 108/108 [==============================] - 0s 626us/step - loss: 0.0810 - accuracy: 0.9032 - val_loss: 0.0752 - val_accuracy: 0.9113 Epoch 22/1000 1/108 [..............................] - ETA: 0s - loss: 0.1138 - accuracy: 0.8750 108/108 [==============================] - 0s 654us/step - loss: 0.0809 - accuracy: 0.9032 - val_loss: 0.0754 - val_accuracy: 0.9113 Epoch 23/1000 1/108 [..............................] - ETA: 0s - loss: 0.1064 - accuracy: 0.8750 108/108 [==============================] - 0s 670us/step - loss: 0.0810 - accuracy: 0.9020 - val_loss: 0.0753 - val_accuracy: 0.9113 Epoch 24/1000 1/108 [..............................] - ETA: 0s - loss: 0.0317 - accuracy: 1.0000 108/108 [==============================] - 0s 660us/step - loss: 0.0809 - accuracy: 0.9038 - val_loss: 0.0754 - val_accuracy: 0.9096 Epoch 25/1000 1/108 [..............................] - ETA: 0s - loss: 0.1475 - accuracy: 0.8125 107/108 [============================>.] - ETA: 0s - loss: 0.0806 - accuracy: 0.9036 108/108 [==============================] - 0s 659us/step - loss: 0.0809 - accuracy: 0.9032 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 26/1000 1/108 [..............................] - ETA: 0s - loss: 0.1299 - accuracy: 0.8750 108/108 [==============================] - 0s 636us/step - loss: 0.0807 - accuracy: 0.9038 - val_loss: 0.0753 - val_accuracy: 0.9113 Epoch 27/1000 1/108 [..............................] - ETA: 0s - loss: 0.1132 - accuracy: 0.8750 108/108 [==============================] - 0s 650us/step - loss: 0.0806 - accuracy: 0.9038 - val_loss: 0.0751 - val_accuracy: 0.9130 Epoch 28/1000 1/108 [..............................] - ETA: 0s - loss: 0.1029 - accuracy: 0.8750 108/108 [==============================] - 0s 648us/step - loss: 0.0806 - accuracy: 0.9026 - val_loss: 0.0752 - val_accuracy: 0.9113 Epoch 29/1000 1/108 [..............................] - ETA: 0s - loss: 0.0951 - accuracy: 0.8750 108/108 [==============================] - 0s 630us/step - loss: 0.0806 - accuracy: 0.9032 - val_loss: 0.0754 - val_accuracy: 0.9130 Epoch 30/1000 1/108 [..............................] - ETA: 0s - loss: 0.0678 - accuracy: 0.8750 108/108 [==============================] - 0s 625us/step - loss: 0.0806 - accuracy: 0.9026 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 31/1000 1/108 [..............................] - ETA: 0s - loss: 0.1952 - accuracy: 0.8125 108/108 [==============================] - 0s 651us/step - loss: 0.0806 - accuracy: 0.9032 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 32/1000 1/108 [..............................] - ETA: 0s - loss: 0.0674 - accuracy: 0.9375 108/108 [==============================] - ETA: 0s - loss: 0.0805 - accuracy: 0.9038 108/108 [==============================] - 0s 682us/step - loss: 0.0805 - accuracy: 0.9038 - val_loss: 0.0754 - val_accuracy: 0.9113 Epoch 33/1000 1/108 [..............................] - ETA: 0s - loss: 0.1174 - accuracy: 0.8125 108/108 [==============================] - 0s 649us/step - loss: 0.0805 - accuracy: 0.9043 - val_loss: 0.0750 - val_accuracy: 0.9113 Epoch 34/1000 1/108 [..............................] - ETA: 0s - loss: 0.1408 - accuracy: 0.8125 108/108 [==============================] - 0s 680us/step - loss: 0.0804 - accuracy: 0.9032 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 35/1000 1/108 [..............................] - ETA: 0s - loss: 0.2623 - accuracy: 0.6250 108/108 [==============================] - 0s 634us/step - loss: 0.0803 - accuracy: 0.9038 - val_loss: 0.0750 - val_accuracy: 0.9130 Epoch 36/1000 1/108 [..............................] - ETA: 0s - loss: 0.1920 - accuracy: 0.7500 108/108 [==============================] - 0s 666us/step - loss: 0.0802 - accuracy: 0.9032 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 37/1000 1/108 [..............................] - ETA: 0s - loss: 0.1077 - accuracy: 0.8750 108/108 [==============================] - 0s 646us/step - loss: 0.0803 - accuracy: 0.9026 - val_loss: 0.0752 - val_accuracy: 0.9113 Epoch 38/1000 1/108 [..............................] - ETA: 0s - loss: 0.0077 - accuracy: 1.0000 108/108 [==============================] - 0s 668us/step - loss: 0.0801 - accuracy: 0.9026 - val_loss: 0.0753 - val_accuracy: 0.9096 Epoch 39/1000 1/108 [..............................] - ETA: 0s - loss: 0.1236 - accuracy: 0.8125 108/108 [==============================] - 0s 662us/step - loss: 0.0802 - accuracy: 0.9020 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 40/1000 1/108 [..............................] - ETA: 0s - loss: 0.0157 - accuracy: 1.0000 108/108 [==============================] - 0s 615us/step - loss: 0.0802 - accuracy: 0.9032 - val_loss: 0.0751 - val_accuracy: 0.9130 Epoch 41/1000 1/108 [..............................] - ETA: 0s - loss: 0.1408 - accuracy: 0.8125 108/108 [==============================] - 0s 662us/step - loss: 0.0801 - accuracy: 0.9038 - val_loss: 0.0750 - val_accuracy: 0.9113 Epoch 42/1000 1/108 [..............................] - ETA: 0s - loss: 0.0651 - accuracy: 0.9375 108/108 [==============================] - 0s 644us/step - loss: 0.0803 - accuracy: 0.9026 - val_loss: 0.0751 - val_accuracy: 0.9130 Epoch 43/1000 1/108 [..............................] - ETA: 0s - loss: 0.0157 - accuracy: 1.0000 108/108 [==============================] - 0s 654us/step - loss: 0.0800 - accuracy: 0.9032 - val_loss: 0.0752 - val_accuracy: 0.9113 Epoch 44/1000 1/108 [..............................] - ETA: 0s - loss: 0.1079 - accuracy: 0.8750 108/108 [==============================] - 0s 672us/step - loss: 0.0799 - accuracy: 0.9026 - val_loss: 0.0751 - val_accuracy: 0.9113 Epoch 45/1000 1/108 [..............................] - ETA: 0s - loss: 0.0548 - accuracy: 0.9375 108/108 [==============================] - 0s 666us/step - loss: 0.0802 - accuracy: 0.9026 - val_loss: 0.0753 - val_accuracy: 0.9096 Epoch 46/1000 1/108 [..............................] - ETA: 0s - loss: 0.0141 - accuracy: 1.0000 108/108 [==============================] - 0s 669us/step - loss: 0.0799 - accuracy: 0.9032 - val_loss: 0.0754 - val_accuracy: 0.9096 Epoch 47/1000 1/108 [..............................] - ETA: 0s - loss: 0.0411 - accuracy: 0.9375 107/108 [============================>.] - ETA: 0s - loss: 0.0804 - accuracy: 0.9030 108/108 [==============================] - 0s 687us/step - loss: 0.0798 - accuracy: 0.9038 - val_loss: 0.0753 - val_accuracy: 0.9130 Epoch 48/1000 1/108 [..............................] - ETA: 0s - loss: 0.0878 - accuracy: 0.8750 108/108 [==============================] - ETA: 0s - loss: 0.0799 - accuracy: 0.9020 108/108 [==============================] - 0s 680us/step - loss: 0.0799 - accuracy: 0.9020 - val_loss: 0.0752 - val_accuracy: 0.9113 Epoch 49/1000 1/108 [..............................] - ETA: 0s - loss: 0.0053 - accuracy: 1.0000 108/108 [==============================] - 0s 646us/step - loss: 0.0798 - accuracy: 0.9038 - val_loss: 0.0753 - val_accuracy: 0.9096 Epoch 50/1000 1/108 [..............................] - ETA: 0s - loss: 0.0154 - accuracy: 1.0000 108/108 [==============================] - 0s 636us/step - loss: 0.0797 - accuracy: 0.9032 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 51/1000 1/108 [..............................] - ETA: 0s - loss: 0.0283 - accuracy: 0.9375 108/108 [==============================] - 0s 643us/step - loss: 0.0798 - accuracy: 0.9043 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 52/1000 1/108 [..............................] - ETA: 0s - loss: 0.0592 - accuracy: 0.9375 108/108 [==============================] - 0s 671us/step - loss: 0.0797 - accuracy: 0.9032 - val_loss: 0.0754 - val_accuracy: 0.9096 Epoch 53/1000 1/108 [..............................] - ETA: 0s - loss: 0.0540 - accuracy: 0.9375 108/108 [==============================] - 0s 666us/step - loss: 0.0796 - accuracy: 0.9026 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 54/1000 1/108 [..............................] - ETA: 0s - loss: 0.1292 - accuracy: 0.8750 108/108 [==============================] - 0s 663us/step - loss: 0.0797 - accuracy: 0.9020 - val_loss: 0.0754 - val_accuracy: 0.9113 Epoch 55/1000 1/108 [..............................] - ETA: 0s - loss: 0.1101 - accuracy: 0.8750 108/108 [==============================] - 0s 682us/step - loss: 0.0796 - accuracy: 0.9038 - val_loss: 0.0754 - val_accuracy: 0.9096 Epoch 56/1000 1/108 [..............................] - ETA: 0s - loss: 0.0112 - accuracy: 1.0000 108/108 [==============================] - 0s 672us/step - loss: 0.0795 - accuracy: 0.9026 - val_loss: 0.0756 - val_accuracy: 0.9078 Epoch 57/1000 1/108 [..............................] - ETA: 0s - loss: 0.0173 - accuracy: 1.0000 108/108 [==============================] - 0s 654us/step - loss: 0.0794 - accuracy: 0.9043 - val_loss: 0.0756 - val_accuracy: 0.9078 Epoch 58/1000 1/108 [..............................] - ETA: 0s - loss: 0.1566 - accuracy: 0.8125 108/108 [==============================] - 0s 652us/step - loss: 0.0795 - accuracy: 0.9032 - val_loss: 0.0757 - val_accuracy: 0.9078 Epoch 59/1000 1/108 [..............................] - ETA: 0s - loss: 0.1021 - accuracy: 0.8750 108/108 [==============================] - 0s 656us/step - loss: 0.0795 - accuracy: 0.9032 - val_loss: 0.0757 - val_accuracy: 0.9078 Epoch 60/1000 1/108 [..............................] - ETA: 0s - loss: 0.0279 - accuracy: 0.9375 108/108 [==============================] - 0s 682us/step - loss: 0.0794 - accuracy: 0.9032 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 61/1000 1/108 [..............................] - ETA: 0s - loss: 0.1796 - accuracy: 0.7500 108/108 [==============================] - 0s 652us/step - loss: 0.0794 - accuracy: 0.9038 - val_loss: 0.0755 - val_accuracy: 0.9113 Epoch 62/1000 1/108 [..............................] - ETA: 0s - loss: 0.0737 - accuracy: 0.9375 108/108 [==============================] - 0s 664us/step - loss: 0.0793 - accuracy: 0.9032 - val_loss: 0.0757 - val_accuracy: 0.9096 Epoch 63/1000 1/108 [..............................] - ETA: 0s - loss: 0.0553 - accuracy: 0.9375 108/108 [==============================] - 0s 651us/step - loss: 0.0793 - accuracy: 0.9038 - val_loss: 0.0755 - val_accuracy: 0.9113 Epoch 64/1000 1/108 [..............................] - ETA: 0s - loss: 0.2263 - accuracy: 0.7500 108/108 [==============================] - 0s 653us/step - loss: 0.0793 - accuracy: 0.9038 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 65/1000 1/108 [..............................] - ETA: 0s - loss: 0.0142 - accuracy: 1.0000 108/108 [==============================] - 0s 674us/step - loss: 0.0793 - accuracy: 0.9026 - val_loss: 0.0756 - val_accuracy: 0.9096 Epoch 66/1000 1/108 [..............................] - ETA: 0s - loss: 0.0589 - accuracy: 0.9375 108/108 [==============================] - 0s 661us/step - loss: 0.0794 - accuracy: 0.9038 - val_loss: 0.0755 - val_accuracy: 0.9096 Epoch 67/1000 1/108 [..............................] - ETA: 0s - loss: 0.2167 - accuracy: 0.7500 108/108 [==============================] - 0s 659us/step - loss: 0.0793 - accuracy: 0.9032 - val_loss: 0.0758 - val_accuracy: 0.9096 Epoch 68/1000 1/108 [..............................] - ETA: 0s - loss: 0.1165 - accuracy: 0.8125 108/108 [==============================] - 0s 663us/step - loss: 0.0792 - accuracy: 0.9038 - val_loss: 0.0757 - val_accuracy: 0.9113 Epoch 69/1000 1/108 [..............................] - ETA: 0s - loss: 0.0706 - accuracy: 0.8750 108/108 [==============================] - 0s 650us/step - loss: 0.0792 - accuracy: 0.9014 - val_loss: 0.0761 - val_accuracy: 0.9061 Epoch 70/1000 1/108 [..............................] - ETA: 0s - loss: 0.0709 - accuracy: 0.8750 108/108 [==============================] - 0s 660us/step - loss: 0.0791 - accuracy: 0.9032 - val_loss: 0.0757 - val_accuracy: 0.9096 Epoch 71/1000 1/108 [..............................] - ETA: 0s - loss: 0.1818 - accuracy: 0.7500 108/108 [==============================] - 0s 690us/step - loss: 0.0792 - accuracy: 0.9038 - val_loss: 0.0758 - val_accuracy: 0.9078 Epoch 72/1000 1/108 [..............................] - ETA: 0s - loss: 0.0556 - accuracy: 0.9375 108/108 [==============================] - 0s 656us/step - loss: 0.0793 - accuracy: 0.9032 - val_loss: 0.0758 - val_accuracy: 0.9096 Epoch 73/1000 1/108 [..............................] - ETA: 0s - loss: 0.0213 - accuracy: 1.0000 108/108 [==============================] - 0s 670us/step - loss: 0.0792 - accuracy: 0.9026 - val_loss: 0.0758 - val_accuracy: 0.9096 Epoch 74/1000 1/108 [..............................] - ETA: 0s - loss: 0.0481 - accuracy: 0.9375 108/108 [==============================] - 0s 642us/step - loss: 0.0792 - accuracy: 0.9055 - val_loss: 0.0757 - val_accuracy: 0.9096 Epoch 75/1000 1/108 [..............................] - ETA: 0s - loss: 0.0894 - accuracy: 0.8750 108/108 [==============================] - 0s 672us/step - loss: 0.0791 - accuracy: 0.9026 - val_loss: 0.0760 - val_accuracy: 0.9061 Epoch 76/1000 1/108 [..............................] - ETA: 0s - loss: 0.1361 - accuracy: 0.8125 105/108 [============================>.] - ETA: 0s - loss: 0.0806 - accuracy: 0.9012 108/108 [==============================] - 0s 670us/step - loss: 0.0792 - accuracy: 0.9032 - val_loss: 0.0761 - val_accuracy: 0.9078 Epoch 77/1000 1/108 [..............................] - ETA: 0s - loss: 0.0763 - accuracy: 0.9375 107/108 [============================>.] - ETA: 0s - loss: 0.0786 - accuracy: 0.9042 108/108 [==============================] - 0s 686us/step - loss: 0.0791 - accuracy: 0.9038 - val_loss: 0.0759 - val_accuracy: 0.9078 Epoch 78/1000 1/108 [..............................] - ETA: 0s - loss: 0.0248 - accuracy: 1.0000 108/108 [==============================] - 0s 647us/step - loss: 0.0790 - accuracy: 0.9038 - val_loss: 0.0760 - val_accuracy: 0.9078 Epoch 79/1000 1/108 [..............................] - ETA: 0s - loss: 0.0115 - accuracy: 1.0000 108/108 [==============================] - 0s 682us/step - loss: 0.0789 - accuracy: 0.9038 - val_loss: 0.0761 - val_accuracy: 0.9078 Epoch 80/1000 1/108 [..............................] - ETA: 0s - loss: 0.1319 - accuracy: 0.8125 108/108 [==============================] - 0s 661us/step - loss: 0.0792 - accuracy: 0.9043 - val_loss: 0.0762 - val_accuracy: 0.9078 Epoch 81/1000 1/108 [..............................] - ETA: 0s - loss: 0.0714 - accuracy: 0.8750 108/108 [==============================] - 0s 677us/step - loss: 0.0789 - accuracy: 0.9038 - val_loss: 0.0761 - val_accuracy: 0.9113 Epoch 82/1000 1/108 [..............................] - ETA: 0s - loss: 0.1032 - accuracy: 0.8750 108/108 [==============================] - 0s 663us/step - loss: 0.0789 - accuracy: 0.9026 - val_loss: 0.0759 - val_accuracy: 0.9096 Epoch 83/1000 1/108 [..............................] - ETA: 0s - loss: 0.0653 - accuracy: 0.8750 108/108 [==============================] - 0s 619us/step - loss: 0.0789 - accuracy: 0.9026 - val_loss: 0.0761 - val_accuracy: 0.9113 Epoch 84/1000 1/108 [..............................] - ETA: 0s - loss: 0.1313 - accuracy: 0.8750 108/108 [==============================] - 0s 666us/step - loss: 0.0791 - accuracy: 0.9020 - val_loss: 0.0761 - val_accuracy: 0.9096 Epoch 85/1000 1/108 [..............................] - ETA: 0s - loss: 0.1216 - accuracy: 0.8750 108/108 [==============================] - 0s 649us/step - loss: 0.0789 - accuracy: 0.9038 - val_loss: 0.0761 - val_accuracy: 0.9096 .. GENERATED FROM PYTHON SOURCE LINES 278-283 Assessing our model ---------------------------- To assess our model, we calculate ROC AUC and investigate calibration curves. From the plots we can see that some of higher probabilities are underestimated by our model, but these are satisfactory results given the number of data we have and a shallow network. Also, we calculate Brier score on unseen data. It amounts to 0.08, which is a good score. .. GENERATED FROM PYTHON SOURCE LINES 283-309 .. code-block:: default #ROC CURVE from sklearn.metrics import roc_curve, roc_auc_score, brier_score_loss fig, axs = plt.subplots(2, figsize=(10,12)) y_pred = model.predict(X_cal) fpr, tpr, _ = roc_curve(y_cal, y_pred) auc = roc_auc_score(y_cal, y_pred) axs[0].plot(fpr,tpr,label= "AUC = " + str(auc)[:4]) axs[0].plot([0, 1], [0, 1], color='black', ls = '--') axs[0].legend() axs[0].set_ylabel('True Positive Rate') axs[0].set_xlabel('False Positive Rate') axs[0].set_title('ROC curve') #CALIBRATION CURVE from sklearn.calibration import calibration_curve prob_true, prob_pred = calibration_curve(y_cal, y_pred, n_bins=10) axs[1].plot(prob_true, prob_pred) axs[1].plot([0, 1], [0, 1], color='black', ls = '--') axs[1].set_ylabel('Empirical Probability') axs[1].set_xlabel('Predicted Probability') axs[1].set_title("Calibration curve") plt.show() #Brier score print("Brier score", brier_score_loss(y_cal, y_pred)) .. image-sg:: /gallery/lesson7/images/sphx_glr_plot_xG_tracking_002.png :alt: ROC curve, Calibration curve :srcset: /gallery/lesson7/images/sphx_glr_plot_xG_tracking_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none 1/18 [>.............................] - ETA: 0s 18/18 [==============================] - 0s 380us/step Brier score 0.08189221088647439 .. GENERATED FROM PYTHON SOURCE LINES 310-315 Calculating xG using our model during UEFA Euro 2020 ---------------------------- With a trained model, we can now apply it on a dataset of our choice. We chose UEFA Euro 2020. First, we store the data the same way as for Indian Super League. Then, we apply the same data transformations as on our training dataset. As the next step we scale our data and make predictions on them. Then, we try to find out 5 players that accumulated the highest open play Expected Goals during the tournament. .. GENERATED FROM PYTHON SOURCE LINES 315-371 .. code-block:: default #getting trackings and events for UEFA Euro the same way as we did for Indian Super League df_match2 = parser.match(competition_id=55, season_id=43) #get array of match ids matches2 = df_match2.match_id.unique() shot_df2 = pd.DataFrame() track_df2 = pd.DataFrame() #for each match store shots and trackings in dataframes for the entire season for match in matches2: df_event = parser.event(match)[0] df_track = parser.event(match)[2] shots = df_event.loc[df_event["type_name"] == "Shot"] shots.x = shots.x.apply(lambda cell: cell*105/120) shots.y = shots.y.apply(lambda cell: cell*68/80) df_track.x = df_track.x.apply(lambda cell: cell*105/120) df_track.y = df_track.y.apply(lambda cell: cell*68/80) shot_df2 = pd.concat([shot_df2, shots], ignore_index = True) track_df2 = pd.concat([track_df2, df_track], ignore_index = True) #reset indicies and remove shots that were not open play or when the goalkeeper was not tracked shot_df2 = shot_df2.loc[shot_df2["sub_type_name"] == "Open Play"] shot_df2.reset_index(drop=True, inplace=True) track_df2.reset_index(drop=True, inplace=True) gks_tracked2 = track_df2.loc[track_df2["teammate"] == False].loc[track_df2["position_name"] == "Goalkeeper"]['id'].unique() shot_df2 = shot_df2.loc[shot_df2["id"].isin(gks_tracked2)] #DATA WRANGLING. DESCRIPTION OF THESE STEPS CAN BE FOUND IN FEATURE ENGINEERING PART model_vars2 = shot_df2[["id", "index", "x", "y"]] model_vars2["goal"] = shot_df2.outcome_name.apply(lambda cell: 1 if cell == "Goal" else 0) model_vars2["goal_smf"] = model_vars2["goal"].astype(object) model_vars2['x0'] = model_vars2.x model_vars2["x"] = model_vars2.x.apply(lambda cell: 105-cell) model_vars2["c"] = model_vars2.y.apply(lambda cell: abs(34-cell)) model_vars2["angle"] = np.where(np.arctan(7.32 * model_vars2["x"] / (model_vars2["x"]**2 + model_vars2["c"]**2 - (7.32/2)**2)) >= 0, np.arctan(7.32 * model_vars2["x"] /(model_vars2["x"]**2 + model_vars2["c"]**2 - (7.32/2)**2)), np.arctan(7.32 * model_vars2["x"] /(model_vars2["x"]**2 + model_vars2["c"]**2 - (7.32/2)**2)) + np.pi)*180/np.pi model_vars2["distance"] = np.sqrt(model_vars2["x"]**2 + model_vars2["c"]**2) model_vars2["xg_basic"]= model_vars2.apply(calculate_xG, b = b, axis=1) model_vars2["gk_distance"] = shot_df2.apply(dist_to_gk, track_df = track_df2, axis = 1) model_vars2["gk_distance_y"] = shot_df2.apply(y_to_gk, track_df = track_df2, axis = 1) model_vars2["triangle"] = shot_df2.apply(players_in_triangle, track_df = track_df2, axis = 1) model_vars2["close_players"] = shot_df2.apply(three_meters_away, track_df = track_df2, axis = 1) model_vars2["gk_dist_to_goal"] = shot_df2.apply(gk_dist_to_goal, track_df = track_df2, axis = 1) model_vars2["is_closer"] = np.where(model_vars2["gk_dist_to_goal"] > model_vars2["distance"], 1, 0) model_vars2["header"] = shot_df2.body_part_name.apply(lambda cell: 1 if cell == "Head" else 0) #store data in a matrix X_unseen = model_vars2[["x0", "is_closer", "angle", "distance", "gk_distance", "gk_distance_y", "triangle", "close_players", "header", "xg_basic"]].values #scale data X_unseen = scaler.transform(X_unseen) #make predictions xgs_euro = model.predict(X_unseen) #find out which 5 players had the highest xG shot_df2["our_xG"] = xgs_euro shot_df2.groupby(["player_name"])["our_xG"].sum().sort_values(ascending = False)[:5].reset_index() .. rst-class:: sphx-glr-script-out .. code-block:: none 1/38 [..............................] - ETA: 0s 38/38 [==============================] - 0s 393us/step .. raw:: html
player_name our_xG
0 Álvaro Borja Morata Martín 2.519620
1 Cristiano Ronaldo dos Santos Aveiro 2.338468
2 Kai Havertz 2.318291
3 Harry Kane 2.306922
4 Ciro Immobile 1.847474


.. rst-class:: sphx-glr-timing **Total running time of the script:** ( 3 minutes 26.908 seconds) .. _sphx_glr_download_gallery_lesson7_plot_xG_tracking.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_xG_tracking.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_xG_tracking.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_