Skip to main content

Similar Player Finder

·432 words·3 mins

Technologies

Python FastAPI pandas scikit-learn Jupyter

Player similarity analysis built on the FIFA 22 dataset. Users define a custom player profile — position and attribute ratings — and the system returns the most comparable real players using cosine similarity.

Includes a training notebook (training.ipynb) and a FastAPI app (app.py) backed by players_22.csv.

TL;DR
#

What this is: A sports analytics project that lets users build a player profile and find similar FIFA 22 players based on seven comparable attributes.

What this isn’t: A production scouting platform. Results are useful but would benefit from additional filters and features.

Stack: Python, pandas, scikit-learn, FastAPI.

Problem Statement
#

Let users build a player profile from attributes they care about, then surface similar real players from the dataset. That similarity layer can support further use cases — team formation ideas, recruitment comparisons, or game-like player discovery.

Dataset
#

  • Source: Kaggle — FIFA 22 player data
  • Selection: Multiple datasets were evaluated for completeness and fit; FIFA 22 was chosen for a simulation-game-like experience accessible to a general audience
  • Earlier exploration: Data from fbref and Transfermarkt was considered, but scraping added complexity without matching the intended UX

Data Cleaning and Preprocessing
#

  • Removed missing and inconsistent entries
  • Normalized and standardized numerical values
  • Handled outliers and imbalanced data where needed
  • Mapped detailed position codes into broader roles (attacker, midfielder, defender); goalkeepers are excluded from similarity search

Feature Selection
#

Seven features drive comparison:

FeatureDataset column
Positionplayer_positions
Speedpace
Passingpassing
Dribblingdribbling
Defensedefending
Physicphysic
Shootingshooting

These were chosen because they directly influence player comparison and are easy for a general user to fill in.

Encoding and Similarity
#

One-hot encoding for position — preserves category uniqueness without implying order between positions.

MinMax scaling for numerical attributes before combining feature vectors.

Cosine similarity to rank players:

  • Measures orientation rather than magnitude
  • Works well with high-dimensional sparse data
  • Reduces skew from differing feature scales

API
#

The FastAPI app loads players_22.csv, builds a similarity matrix at startup, and exposes:

POST /find_similar_players/

Request body:

{
  "player_positions": "midfielder",
  "speed": 75,
  "physic": 70,
  "defense": 65,
  "dribbling": 80,
  "shooting": 72,
  "passing": 85
}

Response:

{
  "similar_players": ["Player A", "Player B", "Player C"]
}

Returns the top 3 closest matches by cosine similarity.

Project Files
#

FilePurpose
app.pyFastAPI service for similarity queries
training.ipynbExploratory analysis and model development
players_22.csvFIFA 22 player dataset

Conclusion
#

The project demonstrates cosine similarity for sports analytics — useful for team formation, recruitment comparison, and performance benchmarking.

Future work: Add more features and filters on top of similarity search to narrow results when raw matches are not specific enough.