Measures of Central Tendency

mode 众数

mean 平均数

x¯=Σxn \bar{x} = \frac{\Sigma x}{n} (样本) μ=ΣXN \mu = \frac{\Sigma X}{N}

median 中位数

Variabilty of Data

IQR=Q3Q1IQR = Q_{3} - Q_{1}(75%-25%)

Outlier<Q11.5IQROutlier < Q_{1} - 1.5*IQR or >Q3+1.5IQR>Q_{3}+1.5*IQR

箱线图

variance 方差

SS=(xix¯)2SS = \sum (x_{i} - \bar{x} ) ^ 2

standard deviation (标准方差)

σ=(xix¯)2n\sigma = \sqrt{\frac{\sum(x_{i} - \bar{x} ) ^ 2}{n}}

当数据成正态分布时,68%的数据在正负一个标准差内,95%的数据在两个标准差内

Bessel's correction standard deviation : s=(xix¯)2n1s = \sqrt{\frac{\sum(x_{i} - \bar{x} ) ^ 2}{n - 1}}(样本标准差)

Numpy & Pandas Tutorials

numbers = [1,2,3,4,5]
numpy.mean(numbers)
numpy.median(numbers)
numpy.std(numbers)

Create a DataFrame

people = ['Sarah', 'Mike', 'Chrisna']
ages  =  [28, 32, 25]
df = DataFrame({'name' : Series(people),
                'age'  : Series(ages)}

Pandas Vectorized Methods

results matching ""

    No results matching ""