[Python] pandas 사용해서 엑셀데이터 분석

728x90

import pandas as pd #pandas 사용

score = pd.read_excel('/Users/wkrdm/Python/db_score.xlsx', #파일경로
                     header = 0, #컬럼명 지정
                     skipfooter = 3, #밑에서 3줄 생략 (필요없는 부분)
                     usecols = 'A:H') #불러올 컬럼 구간
print(score.head(3)) #위에서 3줄 출력
print(score.tail(3)) #밑에서 3줄 출력

score.info() #정보 출력

<class 'pandas.core.frame.DataFrame'>			#pandas의 dataframe 클래스
RangeIndex: 92 entries, 0 to 91					#0~91까지 총 92rows
Data columns (total 8 columns):					#총 8개의 컬럼
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   sno         92 non-null     int64  		#sno컬럼은 빈칸없이 92rows, int
 1   attendance  92 non-null     float64
 2   homework    92 non-null     float64
 3   discussion  92 non-null     int64  
 4   midterm     92 non-null     float64
 5   final       92 non-null     float64
 6   score       92 non-null     float64
 7   grade       92 non-null     object 
dtypes: float64(5), int64(2), object(1)			#float형 변수 5개, int형 변수 2개, object(문자)형 변수 2개
memory usage: 5.9+ KB

데이터 통계분석(평균, 중간값, 분산, 표준편차)

score["midterm"].mean()							#midterm컬럼의 평균값 출력
score[["midterm", "final", "score"]].mean()				#각 컬럼의 평균값 출력

score["midterm"].median()						#midterm컬럼의 중간값 출력
score[["midterm", "final", "score"]].median()				#각 컬럼의 중간값 출력

score["midterm"].var()							#midterm컬럼의 분산 출력
score[["midterm", "final", "score"]].var()				#각 컬럼의 분산 출력

score["midterm"].std()							#midterm컬럼의 표준편차 출력
score[["midterm", "final", "score"]].std()				#각 컬럼의 표준편차 출력

score[["midterm", "final", "score"]].describe()				#통계요약

score.agg({'midterm':['mean','median'], 'final':['mean','median']})	#원하는 데이터 원하는 통계

score[["midterm", "final"]].groupby("Gender").describe()		#그룹별 통계