An analysis of Counter-Strike: Global Offensive pro matches from 2015-2020 ("The Golden Age")¶

csgo

Summer 2025 Data Science Project¶

Contributions:

Bailey Jones: Project idea, dataset curation/preprocessing, data exploration, ML algo design/analysis, prose, formatting, the entire project and each of the checkpoints. Cleaned up C1. Did C2, C3, ML model.

Angelo Parker: Dataset curation, designed and completed C1 (CT vs T win rate).

Chris Duong: conclusion and project idea.

Jiho Lee: dataset curation.

Introduction¶

Counter-Strike: Global Offensive was a competitive First Person Shooter (FPS) video game that existed from August 2012 to September 2023, when it was replaced by its successor Counter Strike 2 (CS2).

Due to the competitive nature of CS:GO, there is data from nearly every professional match played on community forums like HLTV.org.

We would like to explore data from professional matches between November 2015 to February 2020, which can be described at the golden age of CS:GO by some.

Basic overview of CS:GO¶

In CS:GO, there are two sides: Terrorists (T), and Counter-terrorists (CT). Teams will randomly start on either side, CT (Defense) or T (Offense), and then alternate after completing 12 rounds.

To win a round, T side must successfully detonate a bomb on one of the two designated sites, or eliminate all members of CT side. Inversely, CT side must either defuse a planted T side bomb (or rescue a hostage in the Hostage Rescue gamemode) or eliminate all of T side's players.

All players start with a pistol at the start of the game/after death. Better weapons and equipment are bought at the start of each round with money, with individual performance (kills/assists/bomb defusal or plant, etc) granting a player money, as well as round wins that give the entire team money. Teams who lose rounds also get a small amount of money that increases every round lost in succession to prevent a "snowball" win for the other team.

Most competitive matches will play until one team reaches 16 round wins, but some overtime rules may be in place so that if a match is tied at match point (15-15), each team will play an additional 3 rounds per side until one team gets 4 wins. Ties of 18-18, 21-21, and so forth will trigger the overtime process to restart.

Pro sessions will usually play X amount of matches matches on X maps with a best-of-X rule to decide a winner. (X = 1,2,3)

Our analysis¶

With the introduction out of the way, with the dataset available to us we should be able to answer many questions about CS:GO. A common utterance in public chat during a match is "Ugh, why are we losing?" with the response being "Because this map is so CT-sided it's unfair". We would like to answer questions like this and more, such as how impactful a player's damage is to their performance.

Data Preprocessing¶

We have chosen to use a pre-scraped from HLTV.com dataset from Kaggle (CS:GO Professional Matches) that includes data from November 2015 to February 2020.

Our dataset includes 4 CSV files, in the forms of economy.csv, results.csv, players.csv, and picks.csv.

After late revision, we will not be including economy.csv in any sort of evaluation.

In [ ]:
import pandas as pd
import seaborn as sns
from statsmodels.stats.proportion import proportions_ztest
import matplotlib.pyplot as plt

economy_df = pd.read_csv("economy.csv", low_memory=False)
results_df = pd.read_csv("results.csv")
players_df = pd.read_csv("players.csv")
picks_df = pd.read_csv("picks.csv")

Firstly, the players.csv file is massive (around 130MB). There is a lot of unnecessary information for what we aim to do, such as individual match data for each of the pro games. We can cut a significant portion of the fat out by just focusing on the end result of pro matches (results after best of 3 games).

In [ ]:
essential_columns = [
    'date', 'player_name', 'team', 'opponent', 'country',
    'player_id', 'match_id', 'event_id', 'event_name', 'best_of',
    'kills', 'assists', 'deaths', 'hs', 'flash_assists',
    'kast', 'kddiff', 'adr', 'fkdiff', 'rating',
    'kills_ct', 'deaths_ct', 'kddiff_ct', 'adr_ct', 'kast_ct', 'rating_ct',
    'kills_t', 'deaths_t', 'kddiff_t', 'adr_t', 'kast_t', 'rating_t'
]

players_df = players_df[essential_columns].copy()
display(players_df)
date player_name team opponent country player_id match_id event_id event_name best_of ... kddiff_ct adr_ct kast_ct rating_ct kills_t deaths_t kddiff_t adr_t kast_t rating_t
0 2020-02-26 Brehze Evil Geniuses Liquid United States 9136 2339385 4901 IEM Katowice 2020 3 ... 4.0 81.6 79.2 1.10 23.0 31.0 -8.0 77.5 60.0 0.97
1 2020-02-26 CeRq Evil Geniuses Liquid Bulgaria 11219 2339385 4901 IEM Katowice 2020 3 ... 12.0 77.4 72.9 1.16 17.0 29.0 -12.0 63.9 54.3 0.73
2 2020-02-26 EliGE Liquid Evil Geniuses United States 8738 2339385 4901 IEM Katowice 2020 3 ... 14.0 96.6 71.4 1.39 24.0 34.0 -10.0 64.2 64.6 0.86
3 2020-02-26 Ethan Evil Geniuses Liquid United States 10671 2339385 4901 IEM Katowice 2020 3 ... 10.0 74.0 75.0 1.11 10.0 31.0 -21.0 37.8 51.4 0.43
4 2020-02-26 NAF Liquid Evil Geniuses Canada 8520 2339385 4901 IEM Katowice 2020 3 ... 11.0 96.3 85.7 1.36 24.0 29.0 -5.0 61.0 70.8 0.87
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
383312 2015-10-07 kIMERA ExAequo RIP Fonty Italy 7607 2298497 1957 Milan Games Week 2015 League by FACEIT 2 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
383313 2015-10-07 morphiw0w ExAequo RIP Fonty Italy 9752 2298497 1957 Milan Games Week 2015 League by FACEIT 2 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
383314 2015-10-07 overfly RIP Fonty ExAequo Italy 7698 2298497 1957 Milan Games Week 2015 League by FACEIT 2 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
383315 2015-10-07 simozor RIP Fonty ExAequo Italy 9753 2298497 1957 Milan Games Week 2015 League by FACEIT 2 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
383316 2015-10-07 xullE RIP Fonty ExAequo Italy 9754 2298497 1957 Milan Games Week 2015 League by FACEIT 2 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

383317 rows × 32 columns

Next, picks.csv has a lot of useless information including some columns that look like identifiers for the site. We don't need those.

In [ ]:
essential_columns = [
    'date', 'team_1', 'team_2', 'match_id', 'event_id', 'best_of',
    't1_removed_1', 't1_removed_2', 't1_removed_3',
    't2_removed_1', 't2_removed_2', 't2_removed_3',
    't1_picked_1', 't2_picked_1', 'left_over'
]
picks_df = picks_df[essential_columns].copy()
picks_df['date'] = pd.to_datetime(picks_df['date'], errors='coerce')
display(picks_df)
date team_1 team_2 match_id event_id best_of t1_removed_1 t1_removed_2 t1_removed_3 t2_removed_1 t2_removed_2 t2_removed_3 t1_picked_1 t2_picked_1 left_over
0 2020-03-18 TeamOne Recon 5 2340454 5151 3 Vertigo Train 0.0 Nuke Overpass 0.0 Dust2 Inferno Mirage
1 2020-03-18 Rugratz Bad News Bears 2340453 5151 3 Dust2 Nuke 0.0 Mirage Train 0.0 Vertigo Inferno Overpass
2 2020-03-18 New England Whalers Station7 2340461 5243 1 Mirage Dust2 Vertigo Nuke Train Overpass 0.0 0.0 Inferno
3 2020-03-17 Complexity forZe 2340279 5226 3 Inferno Nuke 0.0 Overpass Vertigo 0.0 Dust2 Train Mirage
4 2020-03-17 Singularity Endpoint 2340456 5247 3 Train Mirage 0.0 Nuke Inferno 0.0 Overpass Vertigo Dust2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16030 2016-04-12 GODSENT Natus Vincere 2302059 2099 1 Dust2 Cobblestone Mirage Cache Inferno Overpass 0.0 0.0 Train
16031 2016-04-12 Liquid mousesports 2302058 2099 1 Inferno Train Mirage Overpass Cobblestone Cache 0.0 0.0 Dust2
16032 2016-04-12 Luminosity TYLOO 2302057 2099 1 Dust2 Cache Inferno Train Overpass Cobblestone 0.0 0.0 Mirage
16033 2016-04-12 FaZe Virtus.pro 2302063 2099 1 Overpass Cobblestone Cache Dust2 Inferno Mirage 0.0 0.0 Train
16034 2016-04-12 Tempo Storm Envy 2302064 2099 1 Cache Train Inferno Overpass Mirage Dust2 0.0 0.0 Cobblestone

16035 rows × 15 columns

Let's repeat again with results.csv and removing unnecessary columns.

In [ ]:
essential_columns = [
    'date', 'team_1', 'team_2', '_map',
    'ct_1', 't_1', 'ct_2', 't_2',
    'map_winner', 'starting_ct',
    'match_winner', 'event_id', 'match_id'
]

results_df = results_df[essential_columns].copy()
display(results_df)
date team_1 team_2 _map ct_1 t_1 ct_2 t_2 map_winner starting_ct match_winner event_id match_id
0 2020-03-18 Recon 5 TeamOne Dust2 0 0 15 1 2 2 2 5151 2340454
1 2020-03-18 Recon 5 TeamOne Inferno 8 5 10 6 2 2 2 5151 2340454
2 2020-03-18 New England Whalers Station7 Inferno 9 3 10 6 2 1 2 5243 2340461
3 2020-03-18 Rugratz Bad News Bears Inferno 0 7 8 8 2 2 2 5151 2340453
4 2020-03-18 Rugratz Bad News Bears Vertigo 4 4 11 5 2 2 2 5151 2340453
... ... ... ... ... ... ... ... ... ... ... ... ... ...
45768 2015-11-05 G2 E-frag.net Inferno 8 5 9 7 2 1 2 1970 2299059
45769 2015-11-05 G2 E-frag.net Dust2 10 6 8 5 1 1 2 1970 2299059
45770 2015-11-04 CLG Liquid Inferno 7 9 4 8 1 1 1 1934 2299011
45771 2015-11-03 NiP Dignitas Train 4 12 3 1 1 2 1 1934 2299001
45772 2015-11-03 NiP Envy Cobblestone 4 12 3 6 1 2 1 1934 2299003

45773 rows × 13 columns

C1: CT vs. T Round Win Rate By Map¶

In CS:GO, there are two sides: Terrorists (T), and Counterterrorists (CT). Teams will start on one side, CT (Defense) or T (Offense), and then alternate after completing 12 rounds.

Each map is structured differently, so there may be inherent advantages for a certain side given a map's layout. One could assume that, given such a competitive game, all maps would have balanced win rates and it wouldn't matter whether a team was on the CT side or the T side.

The data found in results.csv might show a different picture. We will attempt to examine total rounds won by both CT and T sides on every map, and calculate win rates over time (the data is from 2015-2020). The results will show whether maps have balanced win rates, or whether certain maps offer implicit advantages when it comes to what side each team is on.

To test for statistical significance, we will use a 2-proportion Z-test with a null hypothesis of CT win rate = 50% and an alternative hypothesis of CT win rate ≠ 50%. This will test whether certain maps have implicit advantages for CT or T sides.

In [ ]:
# 2-Proportion-Z-Test
# Null Hypothesis: CT win rate = 50%
# Alternative Hypothesis: CT win rate ≠ 50%
# Also shows graph of rolling average CT win rates over time per each map (rolling avg window: 500 rounds)
results_df['date'] = pd.to_datetime(results_df['date'], errors='coerce')
results_df = results_df.dropna(subset=['date', '_map'])

ct_1 = results_df[['date', '_map', 'ct_1']].rename(columns={'ct_1': 'ct'})
ct_2 = results_df[['date', '_map', 'ct_2']].rename(columns={'ct_2': 'ct'})
t_1 = results_df[['date', '_map', 't_1']].rename(columns={'t_1': 't'})
t_2 = results_df[['date', '_map', 't_2']].rename(columns={'t_2': 't'})

ct = pd.concat([ct_1, ct_2]).sort_values('date').set_index('date')
t = pd.concat([t_1, t_2]).sort_values('date').set_index('date')

maps = ['Cache', 'Cobblestone', 'Dust2', 'Inferno', 'Mirage', 'Nuke', 'Overpass', 'Train', 'Vertigo']

rolling_ct_avg = {}
hypothesis_results = []

for map_name in maps:
    ct_map = ct[ct['_map'] == map_name]
    t_map = t[t['_map'] == map_name]

    ct_avg = ct_map['ct'].rolling(window=500, min_periods=20, center=True).sum()
    t_avg = t_map['t'].rolling(window=500, min_periods=20, center=True).sum()
    win_pct = (ct_avg / (ct_avg + t_avg)) * 100
    rolling_ct_avg[map_name] = win_pct

    ct_total = ct_map['ct'].sum()
    t_total = t_map['t'].sum()
    z_stat, p_val = proportions_ztest([ct_total, t_total], [ct_total + t_total, ct_total + t_total])

    hypothesis_results.append({
        'Map': map_name,
        'CT Round Wins': ct_total,
        'T Round Wins': t_total,
        'CT Round Win Rate Percentage': round((ct_total / (ct_total + t_total)) * 100, 2),
        'Z-Statistic': round(z_stat, 3),
        'p-value': p_val,
        'Significant? ': 'Yes' if p_val < 0.05 else 'No'
    })

hypothesis_df = pd.DataFrame(hypothesis_results).sort_values('CT Round Win Rate Percentage', ascending=False)
display(hypothesis_df)

plt.figure(figsize=(14, 7))

for map_name, win_pct_data in rolling_ct_avg.items():
    plt.plot(win_pct_data.index, win_pct_data.values, label=map_name, linewidth=1.5, alpha=0.9)

plt.axhline(50, color='black', linestyle='--', linewidth=1)
plt.xlabel('Date')
plt.ylabel('CT Round Win Percentage')
plt.title('CT Round Win Percentage Over Time by Map')
plt.legend(title='Map', loc='center left', bbox_to_anchor=(1, 0.5))
plt.grid(True, linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()
Map CT Round Wins T Round Wins CT Round Win Rate Percentage Z-Statistic p-value Significant?
5 Nuke 58561 48102 54.90 45.290 0.000000e+00 Yes
7 Train 90147 76095 54.23 48.740 0.000000e+00 Yes
6 Overpass 75508 67366 52.85 30.463 8.113994e-204 Yes
4 Mirage 119334 111006 51.81 24.540 5.557588e-133 Yes
3 Inferno 93465 96862 49.11 -11.012 3.350468e-28 Yes
1 Cobblestone 43390 45332 48.91 -9.220 2.960643e-20 Yes
2 Dust2 50899 54195 48.43 -14.378 7.061988e-47 Yes
8 Vertigo 7559 8130 48.18 -6.447 1.141327e-10 Yes
0 Cache 55647 61750 47.40 -25.190 5.142636e-140 Yes
No description has been provided for this image

After analysis, we can see that it is statistically significant that every map has had an implicit advantage towards a certain side at any given moment, some more than others.

For example, we can conclude that teams have statistically been more likely to win rounds if they are on the CT side on the map Nuke. Additionally, we can conclude that teams statistically have been likely to win rounds if they are on the T side on the map Cache.

CS:GO constantly rotated maps in and out of competitive play to avoid stagnation as well as to create major overhauls. This explains why some lines start later than others (in Cache's case) or there is a flat line for some periods (Dust2/Inferno). There simply just weren't games being played professionally on these maps during these times.

Due to CS:GO's nature as a video game that was constantly adjusted with balance changes, we can see that there are non-insignificant dips at certain points in time. For example, in February 2017, Valve created a new version of Inferno that significantly changed features of the map in order to try and balance it. Players naturally develop new strategies over time to account for changes made by developers. A greater look at this history can be seen here: The History of Inferno Banana Control.

C2: Has starting side ever mattered for match wins?¶

As an extension of the previous result of CT vs T round win rate, we would like to determine whether which side the team starts on has an impact on whether or not they win the entire match.

After the 12th round, players swap sides from either T or CT and continue the game from scratch. All money is wiped and it is effectively like the game beginning again from round 1. Intuitively, the side you randomly start on should not have an effect on who ultimately wins the match, but we would like to explore this possibility.

In [ ]:
results_df['ct_start_win'] = results_df['starting_ct'] == results_df['map_winner']

We would like to map out any possible changes in CT start team win trends over time. As discussed previously, reworks to maps would aim to squash major discrepencies like this.

In [ ]:
rolling_window = 1500
plt.figure(figsize=(14, 7))

match_results = []

for map_name in maps:
  map_df = results_df[results_df['_map'] == map_name].sort_values('date')
  ct_wins = (map_df['ct_start_win'].sum())
  total = len(map_df)

  z_stat, p_val = proportions_ztest([ct_wins, total - ct_wins], [total, total])
  win_rate = map_df['ct_start_win'].rolling(window=rolling_window, min_periods=20, center=True).mean()

  match_results.append({
      'Map': map_name,
      'CT Start Wins': ct_wins,
      'Total Matches': total,
      'CT Start Win Rate Percentage': round((ct_wins / total) * 100, 2),
  })


  plt.plot(map_df['date'], win_rate * 100, label=map_name, alpha=0.9)

ct_match_df = pd.DataFrame(match_results).sort_values('CT Start Win Rate Percentage', ascending=False)
display(ct_match_df)

plt.axhline(50, color='gray', linestyle='--')
plt.xlabel("Date")
plt.ylabel("CT-starting Side Team Win Rate (%)")
plt.title(f"CT-Starting Side Team Match Win Rate (Rolling {rolling_window} Matches)")
plt.legend(title='Map', loc='center left', bbox_to_anchor=(1, 0.5))
plt.grid(True)
plt.tight_layout()
plt.show()
Map CT Start Wins Total Matches CT Start Win Rate Percentage
1 Cobblestone 1828 3513 52.04
3 Inferno 3864 7485 51.62
0 Cache 2361 4613 51.18
8 Vertigo 310 609 50.90
2 Dust2 2093 4114 50.88
4 Mirage 4585 9021 50.83
5 Nuke 2114 4206 50.26
7 Train 3280 6566 49.95
6 Overpass 2786 5625 49.53
No description has been provided for this image

Interestingly, we reach different results for this than from the round win mapping. It appears that some maps are ultimately CT sided for overall match wins within the timeframe of our dataset.

Cobblestone, Inferno, Cache, and Mirage all show statistically significant CT-start advantage, while the rest do not.

Inferno, after receiving a rework in 2017, almost immediately jumped up in CT starting wins after teams figured out new strategies like mentioned previously.

A very early prediction is that this can be caused by round 1 and 2 of the match (pistol round + 1) heavily favor CT side due to the longer range engagements on these maps.

This prediction could be supported by the fact that Cobblestone is such an outlier here, with its engagements being some of the longest in the game by distance.

C3: Graphing Pro Player ADR vs Rating¶

Average damage per round (ADR) is a pretty handy way of determining the impact an individual player had on the outcome of the match.

There are many ways of doing damage to your opponents in CS:GO. Grenade damage, direct gun damage, etc. You might lose an engagement to someone that you never even saw, but you could have also damaged them with a well placed Frag or Molotov around a corner, leading your surviving teammates to have an easier time taking them down.

Rating is a much more involved calculation of player impact in a match which you can read about here: Introducing Rating 2.0.

As of data collection date, the HLTV.org site used a measure of kill rating, survival rating, KAST rating, impact rating, and damage rating.

In simpler terms, 1.00 rating is average. Anything above is above average performance, and vice versa.

We want to try and see if there is a correlation between these two metrics of player performance.

In [ ]:
plt.figure(figsize=(10,6))
plt.scatter(players_df['adr'], players_df['rating'], alpha=0.6, s=0.3)
plt.title("Pro Player ADR vs Rating")
plt.xlabel("ADR (Average Damage per Round)")
plt.ylabel("Rating")
plt.grid(True)

plt.show()

correlation = players_df['adr'].corr(players_df['rating'])

display(f"Pearson correlation of ADR vs Rating: {correlation:.3f}")
No description has been provided for this image
'Pearson correlation of ADR vs Rating: 0.880'

With the sheer number of data points (over 300000), we can almost certainly conclude that there is a strong correlation between HLTV's in depth Rating system and the basic ADR number.

In fact, it is so strong that it almost seems unnecessary to do anything more than observe ADR to measure player performance.

Predicting Map Picks with Machine Learning¶

In a match of pro CS:GO, at the start, teams will perform a short veto process. At any point in time, up to the developer Valve's choosing, 7 maps will usually be in the "active pool", with some taken out or put back in to make changes and avoid the game from being stale.

Of this "pool", teams will ban two maps that they do not want to play on and pick one of their choice. This leaves one leftover map. Usually in a best of 3 match, the 2 picked maps and the leftover will be played until either team wins.

With that follows an interesting observation to be made. With two bans, you'd like to ban maps that you know your team struggles on or you know the opposing team dominates at. You'd pick the map you are best at. But the leftover map shows a mutual indifference from either team towards a certain map.

So, can we predict what this leftover map will be just off of the team's history? Let's see. We will use Random Forest Classification to try.

In [ ]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, ConfusionMatrixDisplay, confusion_matrix
import numpy as np

# choose all columns but date and left_over
X = picks_df.drop(columns=['left_over', 'date', 'match_id', 'event_id'])

# one hot encoding
X = pd.get_dummies(X)

y = picks_df['left_over']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# get the top feature importances
importances = model.feature_importances_
indices = np.argsort(importances)[-10:]
top_feats = X.columns[indices]
top_vals = importances[indices]

# plot it
plt.figure(figsize=(10, 6))
plt.barh(top_feats, top_vals)
plt.xlabel("Feature Importance")
plt.title("Top 10 Feature Importances")
plt.tight_layout()
plt.show()

# show accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Random forest accuracy: {accuracy:.2%}")
No description has been provided for this image
Random forest accuracy: 74.31%

To interpret the results of this, on the y-axis labels, t1/2 means the team who picks in either first or second. removed_1/2 is their first or second ban pick. The map name is self-explanatory.

So as a conclusion, notice that our model is quite good with an accuracy of 74.31%. We see that interestingly, Mirage favors heavily on this list. This is probably due to Mirage's staple status as a map in the game. Most teams enjoy playing and are good at this map, leading to the fact that if it is banned 4th, we probably can assume what the leftover map is with great accuracy.

Conclusion¶

Our group was determined to understand what makes CS:GO matches competitive and what factors help predict outcomes when in a game. Since not everyone knows CS:GO as a pro gamer would understand, we started by explaining the basics how Counter-Terrorist and Terrorist sides work, what teams are trying to accomplish each round, and how the in-game economy functions. This quick briefing of the game helps people understand everything that follows for our group findings.

Key Findings¶

Our statistical analysis revealed significant map-specific imbalances in CS:GO. The two-proportion Z-tests demonstrated that Nuke and Train exhibit substantial Counter-Terrorist advantages, with CT win rates of 55-58% (p < 0.001). This confirms a bias from the community about these maps' defensive nature. One the other hand Cache showed a clear trend toward a 50/50 split during 2018-2019, indicating a map balancing efforts by from Valve.

The computing the data provided valuable insights into predictive modeling in esports. Our Random Forest classifier achieved 75% accuracy in predicting final map selections from ban/pick phases, indicating that draft strategies contain substantial information about team intentions and capabilities. There was strong correlation (r = 0.880) between Average Damage per Round and HLTV Rating confirms ADR as a reliable performance metric for player evaluation.

Implications for the CS:GO Community¶

These findings have practical applications for teams, analysts, and tournament organizers. Teams can leverage map-specific win rate data. This helps with particularly prioritizing CT-favored maps when they exceed on defense. The meta trends suggests that analysts should pay closer attention to early-round bans and picks as indicators of team strategy and confidence.

Limitations and Future Work¶

Several limitations constrain our conclusions. The economic analysis remains incomplete, representing a significant gap given the importance of in-game economy in CS:GO strategy. Our dataset's temporal scope may not capture recent meta shifts, and we focused primarily on tier-one professional matches, potentially limiting generalizability to other competitive levels.

Future research should integrate round-by-round economic data to build more comprehensive predictive models. Additionally, investigating how teams might adapt their strategies based on these statistical insights could bridge the gap between data science and practical competitive application. Expanding the analysis to include communication patterns and in-game decision-making would provide even richer insights into what drives success in professional CS:GO.

This tutorial demonstrates how data science techniques can illuminate the strategic depth of competitive gaming, providing both statistical rigor and practical insights for the esports community.