Player similarity analysis built on the FIFA 22 dataset. Users define a custom player profile — position and attribute ratings — and the system returns the most comparable real players using cosine similarity.
Includes a training notebook (training.ipynb) and a FastAPI app (app.py) backed by players_22.csv.
TL;DR#
What this is: A sports analytics project that lets users build a player profile and find similar FIFA 22 players based on seven comparable attributes.
What this isn’t: A production scouting platform. Results are useful but would benefit from additional filters and features.
Stack: Python, pandas, scikit-learn, FastAPI.
Problem Statement#
Let users build a player profile from attributes they care about, then surface similar real players from the dataset. That similarity layer can support further use cases — team formation ideas, recruitment comparisons, or game-like player discovery.
Dataset#
- Source: Kaggle — FIFA 22 player data
- Selection: Multiple datasets were evaluated for completeness and fit; FIFA 22 was chosen for a simulation-game-like experience accessible to a general audience
- Earlier exploration: Data from fbref and Transfermarkt was considered, but scraping added complexity without matching the intended UX
Data Cleaning and Preprocessing#
- Removed missing and inconsistent entries
- Normalized and standardized numerical values
- Handled outliers and imbalanced data where needed
- Mapped detailed position codes into broader roles (
attacker,midfielder,defender); goalkeepers are excluded from similarity search
Feature Selection#
Seven features drive comparison:
| Feature | Dataset column |
|---|---|
| Position | player_positions |
| Speed | pace |
| Passing | passing |
| Dribbling | dribbling |
| Defense | defending |
| Physic | physic |
| Shooting | shooting |
These were chosen because they directly influence player comparison and are easy for a general user to fill in.
Encoding and Similarity#
One-hot encoding for position — preserves category uniqueness without implying order between positions.
MinMax scaling for numerical attributes before combining feature vectors.
Cosine similarity to rank players:
- Measures orientation rather than magnitude
- Works well with high-dimensional sparse data
- Reduces skew from differing feature scales
API#
The FastAPI app loads players_22.csv, builds a similarity matrix at startup, and exposes:
POST /find_similar_players/Request body:
{
"player_positions": "midfielder",
"speed": 75,
"physic": 70,
"defense": 65,
"dribbling": 80,
"shooting": 72,
"passing": 85
}Response:
{
"similar_players": ["Player A", "Player B", "Player C"]
}Returns the top 3 closest matches by cosine similarity.
Project Files#
| File | Purpose |
|---|---|
app.py | FastAPI service for similarity queries |
training.ipynb | Exploratory analysis and model development |
players_22.csv | FIFA 22 player dataset |
Conclusion#
The project demonstrates cosine similarity for sports analytics — useful for team formation, recruitment comparison, and performance benchmarking.
Future work: Add more features and filters on top of similarity search to narrow results when raw matches are not specific enough.