.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gallery/lesson2/plot_xGModelFit.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gallery_lesson2_plot_xGModelFit.py: Fitting the xG model ==================== In this page we go through all the steps of statistically fitting an expected goals model. Before starting watch the following two videos from Friends of Tracking. .. youtube:: bpjLyFyLlXs :width: 640 :height: 349 You will be sad to learn that Tobias (the dog featuring in this video) died suddenly during summer 2022 :-( .. youtube:: wHOgINJ5g54 :width: 640 :height: 349 These use an older version of the code, which is available `here `_. But the steps are the same. .. GENERATED FROM PYTHON SOURCE LINES 26-46 .. code-block:: default # importing necessary libraries import pandas as pd import numpy as np import json # plotting import matplotlib.pyplot as plt from mplsoccer import VerticalPitch # statistical fitting of models import statsmodels.api as sm import statsmodels.formula.api as smf #opening data import os import pathlib import warnings pd.options.mode.chained_assignment = None warnings.filterwarnings('ignore') .. GENERATED FROM PYTHON SOURCE LINES 47-51 Opening data ---------------------------- To fit the xG model we will use Wyscout data. To meet file size requirements of Github, we have to open it from different files, but you can open the file locally from the directory you saved it in. .. GENERATED FROM PYTHON SOURCE LINES 51-61 .. code-block:: default #load data - store it in train dataframe train = pd.DataFrame() for i in range(13): file_name = 'events_England_' + str(i+1) + '.json' path = os.path.join(str(pathlib.Path().resolve().parents[0]), 'data', 'Wyscout', file_name) with open(path) as f: data = json.load(f) train = pd.concat([train, pd.DataFrame(data)]) .. GENERATED FROM PYTHON SOURCE LINES 62-71 Preparing data ---------------------------- Exepcted goals model is build using only shots, so we keep only those actions which *subEventName* was *Shot*. Note that this way penalties are excluded which wouldn't be a case if we used only *eventName*. Then, we store the coordinates of a shot transformed to 105 x 68 pitch. Also, we treat the goal as x = 0. Created *C* is an auxillary variable to help us calculate distance and angle. It is the distance from a point to the horizontal line through the middle of the pitch. We calculate the distance to the goal as the distance on Euclidean plane (see `Distance in R2 `_). and angle using the formula from `The Geometry of Shooting `_. Moreover, we need an information if a goal was scored. It can be found in the *tags* column - if in this column exists {*id*: *101*}. .. GENERATED FROM PYTHON SOURCE LINES 71-83 .. code-block:: default shots = train.loc[train['subEventName'] == 'Shot'] #get shot coordinates as separate columns shots["X"] = shots.positions.apply(lambda cell: (100 - cell[0]['x']) * 105/100) shots["Y"] = shots.positions.apply(lambda cell: cell[0]['y'] * 68/100) shots["C"] = shots.positions.apply(lambda cell: abs(cell[0]['y'] - 50) * 68/100) #calculate distance and angle shots["Distance"] = np.sqrt(shots["X"]**2 + shots["C"]**2) shots["Angle"] = np.where(np.arctan(7.32 * shots["X"] / (shots["X"]**2 + shots["C"]**2 - (7.32/2)**2)) > 0, np.arctan(7.32 * shots["X"] /(shots["X"]**2 + shots["C"]**2 - (7.32/2)**2)), np.arctan(7.32 * shots["X"] /(shots["X"]**2 + shots["C"]**2 - (7.32/2)**2)) + np.pi) #if you ever encounter problems (like you have seen that model treats 0 as 1 and 1 as 0) while modelling - change the dependant variable to object shots["Goal"] = shots.tags.apply(lambda x: 1 if {'id':101} in x else 0) .. GENERATED FROM PYTHON SOURCE LINES 84-88 Plotting shot location ---------------------------- Since we would like to investigate the relationship between shot location and goal location, first we create a heat map of all shots from the 2017/18 Premier League season. .. GENERATED FROM PYTHON SOURCE LINES 88-104 .. code-block:: default #plot pitch pitch = VerticalPitch(line_color='black', half = True, pitch_type='custom', pitch_length=105, pitch_width=68, line_zorder = 2) fig, ax = pitch.grid(grid_height=0.9, title_height=0.06, axis=False, endnote_height=0.04, title_space=0, endnote_space=0) #subtracting x from 105 but not y from 68 because of inverted Wyscout axis #calculate number of shots in each bin bin_statistic_shots = pitch.bin_statistic(105 - shots.X, shots.Y, bins=50) #make heatmap pcm = pitch.heatmap(bin_statistic_shots, ax=ax["pitch"], cmap='Reds', edgecolor='white', linewidth = 0.01) #make legend ax_cbar = fig.add_axes((0.95, 0.05, 0.04, 0.8)) cbar = plt.colorbar(pcm, cax=ax_cbar) fig.suptitle('Shot map - 2017/2018 Premier League Season' , fontsize = 30) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_001.png :alt: Shot map - 2017/2018 Premier League Season :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 105-108 Plotting goal location ---------------------------- Having the shot location, we would also like to know where the goals were scored from. .. GENERATED FROM PYTHON SOURCE LINES 108-126 .. code-block:: default #take only goals goals = shots.loc[shots["Goal"] == 1] #plot pitch pitch = VerticalPitch(line_color='black', half = True, pitch_type='custom', pitch_length=105, pitch_width=68, line_zorder = 2) fig, ax = pitch.grid(grid_height=0.9, title_height=0.06, axis=False, endnote_height=0.04, title_space=0, endnote_space=0) #calculate number of goals in each bin bin_statistic_goals = pitch.bin_statistic(105 - goals.X, goals.Y, bins=50) #plot heatmap pcm = pitch.heatmap(bin_statistic_goals, ax=ax["pitch"], cmap='Reds', edgecolor='white') #make legend ax_cbar = fig.add_axes((0.95, 0.05, 0.04, 0.8)) cbar = plt.colorbar(pcm, cax=ax_cbar) fig.suptitle('Goal map - 2017/2018 Premier League Season' , fontsize = 30) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_002.png :alt: Goal map - 2017/2018 Premier League Season :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_002.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 127-130 Plotting the probability of scoring a goal given the location ---------------------------- Now, we can calculate the proportion of goals scored from each bin to number of shots from that location. .. GENERATED FROM PYTHON SOURCE LINES 130-146 .. code-block:: default #plot pitch pitch = VerticalPitch(line_color='black', half = True, pitch_type='custom', pitch_length=105, pitch_width=68, line_zorder = 2) fig, ax = pitch.grid(grid_height=0.9, title_height=0.06, axis=False, endnote_height=0.04, title_space=0, endnote_space=0) bin_statistic = pitch.bin_statistic(105 - shots.X, shots.Y, bins = 50) #normalize number of goals by number of shots bin_statistic["statistic"] = bin_statistic_goals["statistic"]/bin_statistic["statistic"] #plot heatmap pcm = pitch.heatmap(bin_statistic, ax=ax["pitch"], cmap='Reds', edgecolor='white', vmin = 0, vmax = 0.6) #make legend ax_cbar = fig.add_axes((0.95, 0.05, 0.04, 0.8)) cbar = plt.colorbar(pcm, cax=ax_cbar) fig.suptitle('Probability of scoring' , fontsize = 30) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_003.png :alt: Probability of scoring :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_003.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 147-150 Plotting a logistic curve ---------------------------- Plotting logistic curve .. GENERATED FROM PYTHON SOURCE LINES 150-165 .. code-block:: default b = [3, -3] x = np.arange(5, step=0.1) y = 1/(1+np.exp(b[0]+b[1]*x)) fig,ax = plt.subplots() plt.ylim((-0.05,1.05)) plt.xlim((0,5)) ax.set_ylabel('y') ax.set_xlabel("x") ax.plot(x, y, linestyle='solid', color='black') ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_004.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_004.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 166-170 Investigating the relationship between goals and angle ---------------------------- We want to find out if the angle influences scoring a goal. First we plot if goal was scored given the angle. .. GENERATED FROM PYTHON SOURCE LINES 170-185 .. code-block:: default #first 200 shots shots_200=shots.iloc[:200] #plot first 200 shots goal angle fig, ax = plt.subplots() ax.plot(shots_200['Angle']*180/np.pi, shots_200['Goal'], linestyle='none', marker= '.', markersize= 12, color='black') #make legend ax.set_ylabel('Goal scored') ax.set_xlabel("Shot angle (degrees)") plt.ylim((-0.05,1.05)) ax.set_yticks([0,1]) ax.set_yticklabels(['No','Yes']) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_005.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_005.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 186-190 Investigating the relationship between probability of scoring goals and angle ---------------------------- We want to find out if the angle influences the probability of scoring a goal. First we plot if goal was scored given the angle. .. GENERATED FROM PYTHON SOURCE LINES 190-210 .. code-block:: default #number of shots from angle shotcount_dist = np.histogram(shots['Angle']*180/np.pi, bins=40, range=[0, 150]) #number of goals from angle goalcount_dist = np.histogram(goals['Angle']*180/np.pi, bins=40, range=[0, 150]) np.seterr(divide='ignore', invalid='ignore') #Ignore divide by 0 #probability of scoring goal prob_goal = np.divide(goalcount_dist[0], shotcount_dist[0]) angle = shotcount_dist[1] #Midangle of each interval midangle = (angle[:-1] + angle[1:])/2 #make plot fig,ax = plt.subplots() ax.plot(midangle, prob_goal, linestyle='none', marker= '.', markersize= 12, color='black') ax.set_ylabel('Probability chance scored') ax.set_xlabel("Shot angle (degrees)") ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_006.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_006.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 211-215 Fitting logistic regression with random coefficients ---------------------------- To our data we fit a logistic regression curve with set parameters; -3 for intercept and 3 for angle. However, these are most likely not the best estimators of true parameters. .. GENERATED FROM PYTHON SOURCE LINES 215-230 .. code-block:: default fig, ax = plt.subplots() b = [-3, 3] x = np.arange(150,step=0.1) y = 1/(1+np.exp(-(b[0]+b[1]*x*np.pi/180))) #plot line ax.plot(midangle, prob_goal, linestyle='none', marker= '.', markersize= 12, color='black') #plot logistic function ax.plot(x, y, linestyle='solid', color='black') ax.set_ylabel('Probability chance scored') ax.set_xlabel("Shot angle (degrees)") ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_007.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_007.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 231-234 Calculating log-likelihood ---------------------------- The best parameters are those which maximize the log-likelihood. .. GENERATED FROM PYTHON SOURCE LINES 234-268 .. code-block:: default #calculate xG xG = 1/(1+np.exp(-(b[0]+b[1]*shots['Angle']))) shots = shots.assign(xG = xG) shots_40 = shots.iloc[:40] fig, ax = plt.subplots() #plot data ax.plot(shots_40['Angle']*180/np.pi, shots_40['Goal'], linestyle='none', marker= '.', markersize= 12, color='black', zorder = 3) #plot curves ax.plot(x, y, linestyle=':', color='black', zorder = 2) ax.plot(x, 1-y, linestyle='solid', color='black', zorder = 2) #calculate loglikelihood loglikelihood=0 for item,shot in shots_40.iterrows(): ang = shot['Angle'] * 180/np.pi if shot['Goal'] == 1: loglikelihood = loglikelihood + np.log(shot['xG']) ax.plot([ang,ang],[shot['Goal'],1-shot['xG']], color='red', zorder = 1) else: loglikelihood = loglikelihood + np.log(1 - shot['xG']) ax.plot([ang,ang], [shot['Goal'], 1-shot['xG']], color='blue', zorder = 1) #make legend ax.set_ylabel('Goal scored') ax.set_xlabel("Shot angle (degrees)") plt.ylim((-0.05,1.05)) plt.xlim((0,180)) plt.text(120,0.5,'Log-likelihood:') plt.text(120,0.4,str(loglikelihood)[:6]) ax.set_yticks([0,1]) ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_008.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_008.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 269-272 Fitting logistic regression and finding best parameters ---------------------------- The best parameters are those which maximize the log-likelihood. .. GENERATED FROM PYTHON SOURCE LINES 272-293 .. code-block:: default #create model test_model = smf.glm(formula="Goal ~ Angle" , data=shots, family=sm.families.Binomial()).fit() print(test_model.summary()) #get params b=test_model.params #calculate xG xGprob = 1/(1+np.exp(-(b[0]+b[1]*midangle*np.pi/180))) fig, ax = plt.subplots() #plot data ax.plot(midangle, prob_goal, linestyle='none', marker= '.', markersize= 12, color='black') #plot line ax.plot(midangle, xGprob, linestyle='solid', color='black') #make legend ax.set_ylabel('Probability chance scored') ax.set_xlabel("Shot angle (degrees)") ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_009.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_009.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Goal No. Observations: 8451 Model: GLM Df Residuals: 8449 Model Family: Binomial Df Model: 1 Link Function: Logit Scale: 1.0000 Method: IRLS Log-Likelihood: -2561.2 Date: Thu, 20 Nov 2025 Deviance: 5122.5 Time: 06:56:32 Pearson chi2: 7.96e+03 No. Iterations: 6 Pseudo R-squ. (CS): 0.07609 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ Intercept -3.5248 0.074 -47.517 0.000 -3.670 -3.379 Angle 2.8436 0.117 24.391 0.000 2.615 3.072 ============================================================================== .. GENERATED FROM PYTHON SOURCE LINES 294-298 Investigating the relationship between probability of scoring goals and distance to goal ---------------------------- We want to find out if the distanse influences the probability of scoring a goal. First we plot the probability of scoring given the distance. Then, we fit logistic regression to the data. .. GENERATED FROM PYTHON SOURCE LINES 298-333 .. code-block:: default #number of shots shotcount_dist = np.histogram(shots['Distance'],bins=40,range=[0, 70]) #number of goals goalcount_dist = np.histogram(goals['Distance'],bins=40,range=[0, 70]) #empirical probability of scoring prob_goal = np.divide(goalcount_dist[0],shotcount_dist[0]) distance = shotcount_dist[1] middistance= (distance[:-1] + distance[1:])/2 #making a plot fig, ax = plt.subplots() #plotting data ax.plot(middistance, prob_goal, linestyle='none', marker= '.', color='black') #making legend ax.set_ylabel('Probability chance scored') ax.set_xlabel("Distance from goal (metres)") ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) #make single variable model of distance test_model = smf.glm(formula="Goal ~ Distance" , data=shots, family=sm.families.Binomial()).fit() #print summary print(test_model.summary()) b=test_model.params #calculate xG xGprob=1/(1+np.exp(-(b[0]+b[1]*middistance))) #plot line ax.plot(middistance, xGprob, linestyle='solid', color='black') plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_010.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_010.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Goal No. Observations: 8451 Model: GLM Df Residuals: 8449 Model Family: Binomial Df Model: 1 Link Function: Logit Scale: 1.0000 Method: IRLS Log-Likelihood: -2524.4 Date: Thu, 20 Nov 2025 Deviance: 5048.9 Time: 06:56:32 Pearson chi2: 1.56e+04 No. Iterations: 6 Pseudo R-squ. (CS): 0.08410 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 0.1555 0.090 1.732 0.083 -0.020 0.331 Distance -0.1496 0.006 -23.243 0.000 -0.162 -0.137 ============================================================================== .. GENERATED FROM PYTHON SOURCE LINES 334-339 Adding squared distance to the model ---------------------------- To our model we can add more variables than only one. We can try adding distance to goal squared and see if it improves our predictions. calculating distance squared .. GENERATED FROM PYTHON SOURCE LINES 339-361 .. code-block:: default shots["D2"] = shots['Distance']**2 #adding it to the model test_model = smf.glm(formula="Goal ~ Distance + D2" , data=shots, family=sm.families.Binomial()).fit() #print model summary print(test_model.summary()) #get parameters b=test_model.params #calculate xG xGprob=1/(1+np.exp(-(b[0]+b[1]*middistance+b[2]*pow(middistance,2)))) fig, ax = plt.subplots() #plot line ax.plot(middistance, prob_goal, linestyle='none', marker= '.', color='black') #make legend ax.set_ylabel('Probability chance scored') ax.set_xlabel("Distance from goal (metres)") ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) ax.plot(middistance, xGprob, linestyle='solid', color='black') plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_011.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_011.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Goal No. Observations: 8451 Model: GLM Df Residuals: 8448 Model Family: Binomial Df Model: 2 Link Function: Logit Scale: 1.0000 Method: IRLS Log-Likelihood: -2505.6 Date: Thu, 20 Nov 2025 Deviance: 5011.1 Time: 06:56:32 Pearson chi2: 8.44e+03 No. Iterations: 7 Pseudo R-squ. (CS): 0.08818 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 0.8705 0.132 6.618 0.000 0.613 1.128 Distance -0.2591 0.016 -16.482 0.000 -0.290 -0.228 D2 0.0034 0.000 8.262 0.000 0.003 0.004 ============================================================================== .. GENERATED FROM PYTHON SOURCE LINES 362-366 Adding squared distance to the model ---------------------------- To our model we can add more variables than only one. We can try adding distance to goal squared and see if it improves our predictions. .. GENERATED FROM PYTHON SOURCE LINES 366-430 .. code-block:: default #creating extra variables shots["X2"] = shots['X']**2 shots["C2"] = shots['C']**2 shots["AX"] = shots['Angle']*shots['X'] # list the model variables you want here model_variables = ['Angle','Distance','X','C', "X2", "C2", "AX"] model='' for v in model_variables[:-1]: model = model + v + ' + ' model = model + model_variables[-1] #fit the model test_model = smf.glm(formula="Goal ~ " + model, data=shots, family=sm.families.Binomial()).fit() #print summary print(test_model.summary()) b=test_model.params #return xG value for more general model def calculate_xG(sh): bsum=b[0] for i,v in enumerate(model_variables): bsum=bsum+b[i+1]*sh[v] xG = 1/(1+np.exp(-bsum)) return xG #add an xG to my dataframe xG=shots.apply(calculate_xG, axis=1) shots = shots.assign(xG=xG) #Create a 2D map of xG pgoal_2d=np.zeros((68,68)) for x in range(68): for y in range(68): sh=dict() a = np.arctan(7.32 *x /(x**2 + abs(y-68/2)**2 - (7.32/2)**2)) if a<0: a = np.pi + a sh['Angle'] = a sh['Distance'] = np.sqrt(x**2 + abs(y-68/2)**2) sh['D2'] = x**2 + abs(y-68/2)**2 sh['X'] = x sh['AX'] = x*a sh['X2'] = x**2 sh['C'] = abs(y-68/2) sh['C2'] = (y-68/2)**2 pgoal_2d[x,y] = calculate_xG(sh) #plot pitch pitch = VerticalPitch(line_color='black', half = True, pitch_type='custom', pitch_length=105, pitch_width=68, line_zorder = 2) fig, ax = pitch.draw() #plot probability pos = ax.imshow(pgoal_2d, extent=[-1,68,68,-1], aspect='auto',cmap=plt.cm.Reds,vmin=0, vmax=0.3, zorder = 1) fig.colorbar(pos, ax=ax) #make legend ax.set_title('Probability of goal') plt.xlim((0,68)) plt.ylim((0,60)) plt.gca().set_aspect('equal', adjustable='box') plt.show() .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_012.png :alt: Probability of goal :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_012.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Generalized Linear Model Regression Results ============================================================================== Dep. Variable: Goal No. Observations: 8451 Model: GLM Df Residuals: 8443 Model Family: Binomial Df Model: 7 Link Function: Logit Scale: 1.0000 Method: IRLS Log-Likelihood: -2498.7 Date: Thu, 20 Nov 2025 Deviance: 4997.4 Time: 06:56:33 Pearson chi2: 8.40e+03 No. Iterations: 7 Pseudo R-squ. (CS): 0.08966 Covariance Type: nonrobust ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 0.5103 0.887 0.576 0.565 -1.228 2.248 Angle 0.6338 0.319 1.989 0.047 0.009 1.258 Distance -0.2798 0.118 -2.381 0.017 -0.510 -0.049 X 0.1243 0.124 1.001 0.317 -0.119 0.368 C -0.0300 0.040 -0.750 0.454 -0.109 0.048 X2 0.0014 0.001 1.422 0.155 -0.001 0.003 C2 0.0041 0.003 1.398 0.162 -0.002 0.010 AX -0.1251 0.118 -1.063 0.288 -0.356 0.105 ============================================================================== .. GENERATED FROM PYTHON SOURCE LINES 431-435 Testing model fit ---------------------------- Every time we make a model, it is important to test it. We test our logistic regression model using Mcfaddens Rsquared and ROC curve. .. GENERATED FROM PYTHON SOURCE LINES 435-473 .. code-block:: default # Mcfaddens Rsquared for Logistic regression null_model = smf.glm(formula="Goal ~ 1 ", data=shots, family=sm.families.Binomial()).fit() print("Mcfaddens Rsquared", 1 - test_model.llf / null_model.llf) # ROC curve numobs = 100 TP = np.zeros(numobs) FP = np.zeros(numobs) TN = np.zeros(numobs) FN = np.zeros(numobs) for i, threshold in enumerate(np.arange(0, 1, 1 / numobs)): for j, shot in shots.iterrows(): if (shot['Goal'] == 1): if (shot['xG'] > threshold): TP[i] = TP[i] + 1 else: FN[i] = FN[i] + 1 if (shot['Goal'] == 0): if (shot['xG'] > threshold): FP[i] = FP[i] + 1 else: TN[i] = TN[i] + 1 fig, ax = plt.subplots() ax.plot(FP / (FP + TN), TP / (TP + FN), color='black') ax.plot([0, 1], [0, 1], linestyle='dotted', color='black') ax.set_ylabel("Predicted to score and did TP/(TP+FN))") ax.set_xlabel("Predicted to score but didn't FP/(FP+TN)") plt.ylim((0.00, 1.00)) plt.xlim((0.00, 1.00)) ax.spines['top'].set_visible(False) ax.spines['right'].set_visible(False) .. image-sg:: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_013.png :alt: plot xGModelFit :srcset: /gallery/lesson2/images/sphx_glr_plot_xGModelFit_013.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-script-out .. code-block:: none Mcfaddens Rsquared 0.13708006325049105 .. GENERATED FROM PYTHON SOURCE LINES 474-479 Challenge ---------------------------- 1) Create different models for headers and non-headers (as suggested in `Measuring the Effectiveness of Playing Strategies at Soccer, Pollard (1997) `_)! 2) Assign to penalties xG = 0.8! 3) Find out which player had the highest xG in 2017/18 Premier League season! .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 25.636 seconds) .. _sphx_glr_download_gallery_lesson2_plot_xGModelFit.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_xGModelFit.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_xGModelFit.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_