.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "gallery/plot_UsingWyscout.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_gallery_plot_UsingWyscout.py: Using Wyscout ===================== Getting familiar with Wyscout data .. GENERATED FROM PYTHON SOURCE LINES 7-14 .. code-block:: default #importing necessary libraries import pathlib import os import pandas as pd import json .. GENERATED FROM PYTHON SOURCE LINES 15-19 Competition data ---------------------------- In this dataframe you will find information about the id of a competition and available competitions. If you are trying it locally, comment ..... active lines (put # in front lines path = ...) and comment out (delete #). .. GENERATED FROM PYTHON SOURCE LINES 19-33 .. code-block:: default #path to data path = os.path.join(str(pathlib.Path().resolve()), 'data', 'Wyscout', 'competitions.json') # put # in front if used locally #path = os.path.join(str(pathlib.Path().resolve()), 'Wyscout', 'competitions.json') # delete # #open data with open(path) as f: data = json.load(f) #save it in dataframe df_competitions = pd.DataFrame(data) #structure of data df_competitions.info() .. rst-class:: sphx-glr-script-out .. code-block:: none RangeIndex: 7 entries, 0 to 6 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 name 7 non-null object 1 wyId 7 non-null int64 2 format 7 non-null object 3 area 7 non-null object 4 type 7 non-null object dtypes: int64(1), object(4) memory usage: 408.0+ bytes .. GENERATED FROM PYTHON SOURCE LINES 34-38 Match data ---------------------------- In this dataframe you can find information about all games that were played in Premier League 2017/18 season. *wyId* is the unique id in the Wyscout database. .. GENERATED FROM PYTHON SOURCE LINES 38-50 .. code-block:: default #path to data path = os.path.join(str(pathlib.Path().resolve()), 'data', 'Wyscout', 'matches_England.json') # put # in front if used locally #path = os.path.join(str(pathlib.Path().resolve()), 'data', 'Wyscout', 'matches_England.json') # delete # with open(path) as f: data = json.load(f) #save it in a dataframe df_matches = pd.DataFrame(data) #structure of data df_matches.info() .. rst-class:: sphx-glr-script-out .. code-block:: none RangeIndex: 380 entries, 0 to 379 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 status 380 non-null object 1 roundId 380 non-null int64 2 gameweek 380 non-null int64 3 teamsData 380 non-null object 4 seasonId 380 non-null int64 5 dateutc 380 non-null object 6 winner 380 non-null int64 7 venue 380 non-null object 8 wyId 380 non-null int64 9 label 380 non-null object 10 date 380 non-null object 11 referees 380 non-null object 12 duration 380 non-null object 13 competitionId 380 non-null int64 dtypes: int64(6), object(8) memory usage: 41.7+ KB .. GENERATED FROM PYTHON SOURCE LINES 51-56 Player data ---------------------------- In this dataframe you can find information about all players available for Wyscout public dataset. *wyId* is the player id in the Wyscout database. In the *currentTeamId* you can find the id of a team that the player plays form. *shortName* is an important column for vizualisations and rankings since player's name is written in a shorter way. .. GENERATED FROM PYTHON SOURCE LINES 56-69 .. code-block:: default #path to data path = os.path.join(str(pathlib.Path().resolve()), 'data', 'Wyscout', 'players.json') # put # in front if used locally #path = os.path.join(str(pathlib.Path().resolve()), 'data', 'Wyscout', 'players.json') #open data with open(path) as f: data = json.load(f) #save it in a dataframe df_players = pd.DataFrame(data) #structure of data df_players.info() .. rst-class:: sphx-glr-script-out .. code-block:: none RangeIndex: 3603 entries, 0 to 3602 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 passportArea 3603 non-null object 1 weight 3603 non-null int64 2 firstName 3603 non-null object 3 middleName 3603 non-null object 4 lastName 3603 non-null object 5 currentTeamId 3512 non-null object 6 birthDate 3603 non-null object 7 height 3603 non-null int64 8 role 3603 non-null object 9 birthArea 3603 non-null object 10 wyId 3603 non-null int64 11 foot 3603 non-null object 12 shortName 3603 non-null object 13 currentNationalTeamId 3603 non-null object dtypes: int64(3), object(11) memory usage: 394.2+ KB .. GENERATED FROM PYTHON SOURCE LINES 70-83 Event data ---------------------------- In this dataframe you can find information about all events that occured in all the games during 2017/18 Premier League season. *matchId* matches the wyId from *df_matches*, *playerId* matches *wyId* from *df_players*.*tags* provide information on additional characteristics of an event, for example if the pass was accurate. The location on the pass can be found in *positions*, but remeber, that the data are collected on 100x100 square with reverted y-axis. In the *eventName* you will find the basic name of an event, whereas *subEventName* provide more information. *eventSec* is the time of an event. If you want to learn more about Wyscout data, you can explore `WyScout API `_, but remember to switch the version to 2.0 at the top of the page. This code is adjusted to the webpage with file size limit. If you want to open the data that is stored in the working directory, comment (put '#') before the following code and comment out the lines below them (delete '#'). .. GENERATED FROM PYTHON SOURCE LINES 83-106 .. code-block:: default #prepare empty dataframe df_events = pd.DataFrame() # put # in front if used locally for i in range(13): # put # in front if used locally #get file name and path to it file_name = 'events_England_' + str(i+1) + '.json' # put # in front if used locally path = os.path.join(str(pathlib.Path().resolve()), 'data', 'Wyscout', file_name) # put # in front if used locally #open data with open(path) as f: # put # in front if used locally data = json.load(f) # put # in front if used locally #append data to the dataframe df_events = pd.concat([df_events, pd.DataFrame(data)]) # put # in front if used locally #path = os.path.join(str(pathlib.Path().resolve()), 'Wyscout', 'events_England_.json') # delete # #with open(path) as f: # delete # #data = json.load(f) # delete # #df_events = pd.DataFrame(data) # delete # #structure of data df_events.info() .. rst-class:: sphx-glr-script-out .. code-block:: none Int64Index: 643150 entries, 0 to 43149 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 eventId 643150 non-null int64 1 subEventName 643150 non-null object 2 tags 643150 non-null object 3 playerId 643150 non-null int64 4 positions 643150 non-null object 5 matchId 643150 non-null int64 6 eventName 643150 non-null object 7 teamId 643150 non-null int64 8 matchPeriod 643150 non-null object 9 eventSec 643150 non-null float64 10 subEventId 643150 non-null object 11 id 643150 non-null int64 dtypes: float64(1), int64(5), object(6) memory usage: 63.8+ MB .. GENERATED FROM PYTHON SOURCE LINES 107-110 Before you start ---------------------------- Run these lines in Spyder/Jupyter notebook and explore dataframes to get more familiar before you start working on the course. .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 8.662 seconds) .. _sphx_glr_download_gallery_plot_UsingWyscout.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_UsingWyscout.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_UsingWyscout.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_