작성일자 : 2024-09-25
수정일자 : 2024-09-29
Ver 0.1.2
1. 데이터 준비
Terra Mystica Snellman Statistics
Terra Mystica Snellman Statistics
Game logs and statistics from online games of Terra Mystica
www.kaggle.com
2. EDA - 데이터 파악
# 모듈 import
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
# Directory 확인
os.getcwd()
'/Users/limjongjun/Desktop/JayJay/Growth/Python/Game_Analysis/ipynb'
Game Event
# 데이터 Directory 할당 및 변경
data_dir = '/Users/limjongjun/Desktop/JayJay/Growth/Python/Game_Analysis/Data'
os.chdir(data_dir)
os.getcwd()
'/Users/limjongjun/Desktop/JayJay/Growth/Python/Game_Analysis/Data'
# 데이터 불러오기 및 정보 확인
df = pd.read_csv('game_events.csv', chunksize = 1000) # 적당한 청크 크기 설정
# 샘플링을 위해 각 청크에서 일정 비율로 데이터 추출
game_events_sample = pd.concat([chunk.sample(frac = 0.01) for chunk in df]) # 예: 1% 샘플링
print(game_events_sample.info())
print('\n')
print('------------')
print('\n')
print(game_events_sample.head())
print('\n')
print('------------')
print('\n')
print("데이터 구조 확인", game_events_sample.shape)
<class 'pandas.core.frame.DataFrame'>
Index: 734191 entries, 678 to 73419021
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 event 734191 non-null object
1 faction 734191 non-null object
2 game 734183 non-null object
3 num 734191 non-null int64
4 round 734191 non-null int64
5 turn 734191 non-null int64
dtypes: int64(3), object(3)
memory usage: 39.2+ MB
None
------------
event faction game num round turn
678 upgrade:SA all 50615 1 6 3
319 leech:count dragonlords 50615 1 1 2
581 town:TW7 all 50615 1 4 2
676 upgrade:SA all 50615 1 6 1
694 build:D all 50615 1 5 4
------------
데이터 구조 확인 (734191, 6)
# columns 확인
print("<colums 확인>")
print(game_events_sample.columns)
<colums 확인>
Index(['event', 'faction', 'game', 'num', 'round', 'turn'], dtype='object')
- game: String used as a unique identifier for the game
- event: String describing various events occurring within the game (ex: vp, upgrade:TP, build:D etc).
- faction: String representing the camp associated with the event.
- num: An integer that represents the numerical value or frequency of occurrence of an event.
- round: An integer that represents the round of the game in which the event occurred.
- turn: An integer that represents the turn on which the event occurred.
print("dtype이 'object'인 컬럼 이름만 리스트로 가져오기")
print('\n')
# dtype이 'object'인 컬럼 이름만 리스트로 가져오기
object_columns = game_events_sample.select_dtypes(include='object').columns.tolist()
print(object_columns)
print('\n')
print('------------')
print('\n')
# 컬럼별 Unique 값 확인
print("컬럼별 Unique 값 확인")
print('\n')
for i in object_columns :
print(game_events_sample[i].unique())
print('\n')
print('------------')
print('\n')
dtype이 'object'인 컬럼 이름만 리스트로 가져오기
['event', 'faction', 'game']
------------
컬럼별 Unique 값 확인
['upgrade:SH' 'order:2' 'order:1' 'vp' 'upgrade:TP' 'town:TW1' 'pass:BON4'
'action:ACT2' 'pass:BON5' 'bridge' 'pass:BON6' 'dig' 'leech:pw'
'town:any' 'pass:BON9' 'leech:count' 'upgrade:TE' 'action:ACT1'
'favor:FAV7' 'favor:any' 'build:D' 'burn' 'action:ACT3' 'advance:ship'
'action:ACT4' 'town:TW4' 'action:FAV6' 'action:ACT5' 'pass:BON10'
'mermaid:connect' 'favor:FAV6' 'town:TW7' 'pick-color:brown' 'decline:pw'
'decline:count' 'favor:FAV12' 'action:BON1' 'pass:BON7' 'pass:BON1'
'upgrade:SA' 'order:4' 'favor:FAV11' 'favor:FAV4' 'pick-color:red'
'pass:BON3' 'order:3' 'town:TW3' 'favor:FAV3' 'town:TW2' 'advance:dig'
'pick-color:blue' 'action:ACT6' 'pass:BON8' 'favor:FAV10' 'town:TW6'
'favor:FAV5' 'pick-color:green' 'town:TW5' 'town:TW8' 'favor:FAV1'
'favor:FAV2' 'action:BON2' 'pass:BON2' 'action:ACTN' 'order:5'
'favor:FAV8' 'favor:FAV9' 'action:ACTA' 'action:ACTE' 'action:ACTW'
'leech-from-cultist:pw' 'decline-from-cultist:count'
'leech-from-cultist:count' 'decline-from-cultist:pw' 'cultist:cult'
'cultist:pw' 'drop' 'action:ACTS' 'action:ACTC' 'action:ACTH2'
'pick-color:black' 'pick-color:gray' 'action:ACTH1' 'pick-color:yellow'
'action:ACTG' 'order:6' 'action:ACTH5' 'action:ACTH3' 'action:ACTH4'
'order:7' 'action:ACTH6']
------------
['halflings' 'all' 'mermaids' 'dragonlords' 'riverwalkers' 'yetis'
'darklings' 'chaosmagicians' 'engineers' 'icemaidens' 'nomads' 'auren'
'fakirs' 'witches' 'cultists' 'alchemists' 'swarmlings' 'dwarves'
'shapeshifters' 'nofaction1' 'nofaction4' 'acolytes' 'giants'
'nofaction3' 'nofaction2' 'nofaction5' 'nofaction7']
------------
['0000000001' '050615' '0512' ... 'ytuwertqwtr' 'yuertyqert'
'ZeicheMasZeuchnis']
------------
game_events['event'] 속성
- action:* - Special / power actions taken
- advance:* - Actions to advance on shipping / digging
- bridge - Bridge built
- build:D - Dwellings built
- burn - Amount of power burned
- cultist:cult - Amount of awards cultist/shapeshifter received from opponent leeching
- cultist:pw - Amount of awards cultist/shapeshifter received from all opponents declining leech
- decline:count - Number of times leech was declined
- decline:pw - Amount of leeched power that was declined
- decline-from-cultist:count - Number of times this faction declined leech form cultists
- decline-from-cultist:pw - Amount of power this faction declined from cultists
- dig - Number of spades generated by converting workers / priests (for volcano factions number of volcanos created)
- drop - 1 if faction dropped out on that turn
- favor:* - Favors taken
- leech:count - Number of times leech was accepted
- leech:pw - Amount of power generated by leeching
- leech-from-cultist:count - Number of times faction leeched from cultists / shapeshifters
- leech-from-cultist:pw - Amount of power this faction leeched from cultists / hapeshifters
- mermaid:connect - Mermaid special ability used to create town
- order:* - Being on a specific place in turn order (e.g. order:2 shows rounds where faction was second in turn order)
- pass:* - Passing to take this bonus tile
- town:* - Taking this town tile
- upgrade:* - Creating one of these buildings by upgrading
- vp: Number of VP received (from any source)
PlayerParticipation
# 데이터 파일 로드 함수 할당
def load_data(filepath) :
try :
return pd.read_csv(filepath)
except Exception as e :
print(f"Error loading {filepath} : {str(e)}")
return None
# 데이터 로드
game_factions = load_data('game_factions.csv')
모든 파일의 기본적인 데이터 특성 파악
### 모든 파일의 기본적인 데이터 특성파악과 전처리
# 함수 정의
def analyze_csv(file_path) :
try :
# 데이터 로드
df = pd.read_csv(file_path)
print(f"File loaded: {file_path}")
# 컬럼정보
print("Columns:")
print(df.columns.tolist())
# 데이터 타입
print("Data Type")
print(df.dtypes)
#결측치 개수
print("Number of Missing Values:")
print(df.isnull().sum())
#기본 통계 요약
print("Basic Statistical Summary:")
print(df.describe(include = 'all'))
print("\n" + "-" * 50 + "\n") #구분선
except Exception as e:
print(f"Failed to load {file_path}: str(e)")
# 파일 경로 리스트
file_paths = ['game_factions.csv'
,'game_locations.csv'
,'game_options.csv'
,'game_scoring_tiles.csv'
,'games.csv'
,'ratings_factions.csv'
,'ratings_player_factions.csv'
,'ratings_players.csv'
,'stats.csv'
,'sample_game_events.csv']
# 각 파일에 대한 분석 수행
for path in file_paths :
analyze_csv(path)
3. EDA - 데이터 시각화
Game Event
warnings.filterwarnings('ignore')
# figure 생성
fig = plt.figure(figsize=(10, 8))
# 상단 그리드 설정 (nrows=1, ncols=1)
gs1_nrows = 1
gs1_ncols = 1
gs1 = fig.add_gridspec(nrows = gs1_nrows, ncols = gs1_ncols, left = 0, bottom = 0.4, top = 1)
# 상단 subplot
ax1 = fig.add_subplot(gs1[0, 0])
sns.countplot(data = game_events_sample, x = 'event', ax = ax1)
ax1.set_title('Event Frequency in Game Event')
ax1.set_xticks(ax1.get_xticks()) # xticks 추가 (회전 적용)
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=60)
ax1.set_xlabel('')
# 하단 그리드 설정 (nrows=1, ncols=2)
gs2_nrows = 1
gs2_ncols = 2
gs2 = fig.add_gridspec(nrows = gs2_nrows, ncols = gs2_ncols, left = 0, bottom = 0, top = 0.5)
# 하단 왼쪽 subplot (첫 번째 subplot)
ax2 = fig.add_subplot(gs2[0, 0])
sns.barplot(data = game_events_sample, x = 'faction', y = 'num', estimator = sum, ax = ax2)
ax2.set_title('Total Event Values by Faction')
ax2.set_xticks(ax2.get_xticks()) # xticks 추가 (회전 적용)
ax2.set_xticklabels(ax2.get_xticklabels(), rotation=60)
ax2.set_xlabel('')
# 하단 오른쪽 subplot (두 번째 subplot)
ax3 = fig.add_subplot(gs2[0, 1])
round_event_count = game_events_sample.groupby('round').size().reset_index(name='counts')
sns.lineplot(data = round_event_count, x = 'round', y = 'counts', marker='o')
plt.title('Event Frequency by Round')
plt.xlabel('Round')
plt.ylabel('Event Frequency')
plt.grid(True)
plt.tight_layout()
plt.show()
# 레이아웃 최적화
fig.tight_layout()
# 그래프 출력
plt.show()

Player Participation
#Player별 Count 및 Index Reset
player_activity = game_factions['player'].value_counts().reset_index()
#Player_activity의 Columns 이름 재할당
player_activity.columns = ['player', 'frequency']
#상위 20명 Player 참여 빈도 시각화
plt.figure(figsize = (12,6))
ax = sns.barplot(data=player_activity.head(20), x='player', y='frequency')
plt.title('Top 20 Players by Game Participation')
plt.xlabel('Player')
plt.ylabel('Number of Games Participated')
plt.xticks(rotation = 45)
# 막대 위에 각 frequency 값을 표시하는 코드
for p in ax.patches:
ax.annotate(format(p.get_height(), '.0f'),
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center',
xytext=(0, 8),
textcoords='offset points')
# ax.patches를 사용하여 각 막대에 접근.
# ax.annotate()로 막대 위에 값을 표시:
# p.get_height()는 막대의 높이(빈도수)를 가져옴.
# p.get_x() + p.get_width() / 2.는 막대의 중앙에 값을 배치.
# xytext=(0, 8)은 텍스트를 막대 위에 조금 띄워서 위치시킴.
# format(p.get_height(), '.0f')로 소수점 없이 정수로 표시.
plt.tight_layout()
plt.show()

작성일자 : 2024-09-25
수정일자 : 2024-09-29
Ver 0.1.2
1. 데이터 준비
Terra Mystica Snellman Statistics
Terra Mystica Snellman Statistics
Game logs and statistics from online games of Terra Mystica
www.kaggle.com
2. EDA - 데이터 파악
# 모듈 import
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
# Directory 확인
os.getcwd()
'/Users/limjongjun/Desktop/JayJay/Growth/Python/Game_Analysis/ipynb'
Game Event
# 데이터 Directory 할당 및 변경
data_dir = '/Users/limjongjun/Desktop/JayJay/Growth/Python/Game_Analysis/Data'
os.chdir(data_dir)
os.getcwd()
'/Users/limjongjun/Desktop/JayJay/Growth/Python/Game_Analysis/Data'
# 데이터 불러오기 및 정보 확인
df = pd.read_csv('game_events.csv', chunksize = 1000) # 적당한 청크 크기 설정
# 샘플링을 위해 각 청크에서 일정 비율로 데이터 추출
game_events_sample = pd.concat([chunk.sample(frac = 0.01) for chunk in df]) # 예: 1% 샘플링
print(game_events_sample.info())
print('\n')
print('------------')
print('\n')
print(game_events_sample.head())
print('\n')
print('------------')
print('\n')
print("데이터 구조 확인", game_events_sample.shape)
<class 'pandas.core.frame.DataFrame'>
Index: 734191 entries, 678 to 73419021
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 event 734191 non-null object
1 faction 734191 non-null object
2 game 734183 non-null object
3 num 734191 non-null int64
4 round 734191 non-null int64
5 turn 734191 non-null int64
dtypes: int64(3), object(3)
memory usage: 39.2+ MB
None
------------
event faction game num round turn
678 upgrade:SA all 50615 1 6 3
319 leech:count dragonlords 50615 1 1 2
581 town:TW7 all 50615 1 4 2
676 upgrade:SA all 50615 1 6 1
694 build:D all 50615 1 5 4
------------
데이터 구조 확인 (734191, 6)
# columns 확인
print("<colums 확인>")
print(game_events_sample.columns)
<colums 확인>
Index(['event', 'faction', 'game', 'num', 'round', 'turn'], dtype='object')
- game: String used as a unique identifier for the game
- event: String describing various events occurring within the game (ex: vp, upgrade:TP, build:D etc).
- faction: String representing the camp associated with the event.
- num: An integer that represents the numerical value or frequency of occurrence of an event.
- round: An integer that represents the round of the game in which the event occurred.
- turn: An integer that represents the turn on which the event occurred.
print("dtype이 'object'인 컬럼 이름만 리스트로 가져오기")
print('\n')
# dtype이 'object'인 컬럼 이름만 리스트로 가져오기
object_columns = game_events_sample.select_dtypes(include='object').columns.tolist()
print(object_columns)
print('\n')
print('------------')
print('\n')
# 컬럼별 Unique 값 확인
print("컬럼별 Unique 값 확인")
print('\n')
for i in object_columns :
print(game_events_sample[i].unique())
print('\n')
print('------------')
print('\n')
dtype이 'object'인 컬럼 이름만 리스트로 가져오기
['event', 'faction', 'game']
------------
컬럼별 Unique 값 확인
['upgrade:SH' 'order:2' 'order:1' 'vp' 'upgrade:TP' 'town:TW1' 'pass:BON4'
'action:ACT2' 'pass:BON5' 'bridge' 'pass:BON6' 'dig' 'leech:pw'
'town:any' 'pass:BON9' 'leech:count' 'upgrade:TE' 'action:ACT1'
'favor:FAV7' 'favor:any' 'build:D' 'burn' 'action:ACT3' 'advance:ship'
'action:ACT4' 'town:TW4' 'action:FAV6' 'action:ACT5' 'pass:BON10'
'mermaid:connect' 'favor:FAV6' 'town:TW7' 'pick-color:brown' 'decline:pw'
'decline:count' 'favor:FAV12' 'action:BON1' 'pass:BON7' 'pass:BON1'
'upgrade:SA' 'order:4' 'favor:FAV11' 'favor:FAV4' 'pick-color:red'
'pass:BON3' 'order:3' 'town:TW3' 'favor:FAV3' 'town:TW2' 'advance:dig'
'pick-color:blue' 'action:ACT6' 'pass:BON8' 'favor:FAV10' 'town:TW6'
'favor:FAV5' 'pick-color:green' 'town:TW5' 'town:TW8' 'favor:FAV1'
'favor:FAV2' 'action:BON2' 'pass:BON2' 'action:ACTN' 'order:5'
'favor:FAV8' 'favor:FAV9' 'action:ACTA' 'action:ACTE' 'action:ACTW'
'leech-from-cultist:pw' 'decline-from-cultist:count'
'leech-from-cultist:count' 'decline-from-cultist:pw' 'cultist:cult'
'cultist:pw' 'drop' 'action:ACTS' 'action:ACTC' 'action:ACTH2'
'pick-color:black' 'pick-color:gray' 'action:ACTH1' 'pick-color:yellow'
'action:ACTG' 'order:6' 'action:ACTH5' 'action:ACTH3' 'action:ACTH4'
'order:7' 'action:ACTH6']
------------
['halflings' 'all' 'mermaids' 'dragonlords' 'riverwalkers' 'yetis'
'darklings' 'chaosmagicians' 'engineers' 'icemaidens' 'nomads' 'auren'
'fakirs' 'witches' 'cultists' 'alchemists' 'swarmlings' 'dwarves'
'shapeshifters' 'nofaction1' 'nofaction4' 'acolytes' 'giants'
'nofaction3' 'nofaction2' 'nofaction5' 'nofaction7']
------------
['0000000001' '050615' '0512' ... 'ytuwertqwtr' 'yuertyqert'
'ZeicheMasZeuchnis']
------------
game_events['event'] 속성
- action:* - Special / power actions taken
- advance:* - Actions to advance on shipping / digging
- bridge - Bridge built
- build:D - Dwellings built
- burn - Amount of power burned
- cultist:cult - Amount of awards cultist/shapeshifter received from opponent leeching
- cultist:pw - Amount of awards cultist/shapeshifter received from all opponents declining leech
- decline:count - Number of times leech was declined
- decline:pw - Amount of leeched power that was declined
- decline-from-cultist:count - Number of times this faction declined leech form cultists
- decline-from-cultist:pw - Amount of power this faction declined from cultists
- dig - Number of spades generated by converting workers / priests (for volcano factions number of volcanos created)
- drop - 1 if faction dropped out on that turn
- favor:* - Favors taken
- leech:count - Number of times leech was accepted
- leech:pw - Amount of power generated by leeching
- leech-from-cultist:count - Number of times faction leeched from cultists / shapeshifters
- leech-from-cultist:pw - Amount of power this faction leeched from cultists / hapeshifters
- mermaid:connect - Mermaid special ability used to create town
- order:* - Being on a specific place in turn order (e.g. order:2 shows rounds where faction was second in turn order)
- pass:* - Passing to take this bonus tile
- town:* - Taking this town tile
- upgrade:* - Creating one of these buildings by upgrading
- vp: Number of VP received (from any source)
PlayerParticipation
# 데이터 파일 로드 함수 할당
def load_data(filepath) :
try :
return pd.read_csv(filepath)
except Exception as e :
print(f"Error loading {filepath} : {str(e)}")
return None
# 데이터 로드
game_factions = load_data('game_factions.csv')
모든 파일의 기본적인 데이터 특성 파악
### 모든 파일의 기본적인 데이터 특성파악과 전처리
# 함수 정의
def analyze_csv(file_path) :
try :
# 데이터 로드
df = pd.read_csv(file_path)
print(f"File loaded: {file_path}")
# 컬럼정보
print("Columns:")
print(df.columns.tolist())
# 데이터 타입
print("Data Type")
print(df.dtypes)
#결측치 개수
print("Number of Missing Values:")
print(df.isnull().sum())
#기본 통계 요약
print("Basic Statistical Summary:")
print(df.describe(include = 'all'))
print("\n" + "-" * 50 + "\n") #구분선
except Exception as e:
print(f"Failed to load {file_path}: str(e)")
# 파일 경로 리스트
file_paths = ['game_factions.csv'
,'game_locations.csv'
,'game_options.csv'
,'game_scoring_tiles.csv'
,'games.csv'
,'ratings_factions.csv'
,'ratings_player_factions.csv'
,'ratings_players.csv'
,'stats.csv'
,'sample_game_events.csv']
# 각 파일에 대한 분석 수행
for path in file_paths :
analyze_csv(path)
3. EDA - 데이터 시각화
Game Event
warnings.filterwarnings('ignore')
# figure 생성
fig = plt.figure(figsize=(10, 8))
# 상단 그리드 설정 (nrows=1, ncols=1)
gs1_nrows = 1
gs1_ncols = 1
gs1 = fig.add_gridspec(nrows = gs1_nrows, ncols = gs1_ncols, left = 0, bottom = 0.4, top = 1)
# 상단 subplot
ax1 = fig.add_subplot(gs1[0, 0])
sns.countplot(data = game_events_sample, x = 'event', ax = ax1)
ax1.set_title('Event Frequency in Game Event')
ax1.set_xticks(ax1.get_xticks()) # xticks 추가 (회전 적용)
ax1.set_xticklabels(ax1.get_xticklabels(), rotation=60)
ax1.set_xlabel('')
# 하단 그리드 설정 (nrows=1, ncols=2)
gs2_nrows = 1
gs2_ncols = 2
gs2 = fig.add_gridspec(nrows = gs2_nrows, ncols = gs2_ncols, left = 0, bottom = 0, top = 0.5)
# 하단 왼쪽 subplot (첫 번째 subplot)
ax2 = fig.add_subplot(gs2[0, 0])
sns.barplot(data = game_events_sample, x = 'faction', y = 'num', estimator = sum, ax = ax2)
ax2.set_title('Total Event Values by Faction')
ax2.set_xticks(ax2.get_xticks()) # xticks 추가 (회전 적용)
ax2.set_xticklabels(ax2.get_xticklabels(), rotation=60)
ax2.set_xlabel('')
# 하단 오른쪽 subplot (두 번째 subplot)
ax3 = fig.add_subplot(gs2[0, 1])
round_event_count = game_events_sample.groupby('round').size().reset_index(name='counts')
sns.lineplot(data = round_event_count, x = 'round', y = 'counts', marker='o')
plt.title('Event Frequency by Round')
plt.xlabel('Round')
plt.ylabel('Event Frequency')
plt.grid(True)
plt.tight_layout()
plt.show()
# 레이아웃 최적화
fig.tight_layout()
# 그래프 출력
plt.show()

Player Participation
#Player별 Count 및 Index Reset
player_activity = game_factions['player'].value_counts().reset_index()
#Player_activity의 Columns 이름 재할당
player_activity.columns = ['player', 'frequency']
#상위 20명 Player 참여 빈도 시각화
plt.figure(figsize = (12,6))
ax = sns.barplot(data=player_activity.head(20), x='player', y='frequency')
plt.title('Top 20 Players by Game Participation')
plt.xlabel('Player')
plt.ylabel('Number of Games Participated')
plt.xticks(rotation = 45)
# 막대 위에 각 frequency 값을 표시하는 코드
for p in ax.patches:
ax.annotate(format(p.get_height(), '.0f'),
(p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center',
xytext=(0, 8),
textcoords='offset points')
# ax.patches를 사용하여 각 막대에 접근.
# ax.annotate()로 막대 위에 값을 표시:
# p.get_height()는 막대의 높이(빈도수)를 가져옴.
# p.get_x() + p.get_width() / 2.는 막대의 중앙에 값을 배치.
# xytext=(0, 8)은 텍스트를 막대 위에 조금 띄워서 위치시킴.
# format(p.get_height(), '.0f')로 소수점 없이 정수로 표시.
plt.tight_layout()
plt.show()
