Thursday, July 15, 2021

Variables in Statistics

Variables in Statistics

Data collection is the first step of data analysis. After collecting data for analysis, it is important to understand the structural parts of a dataset, and how they're measured. 

Let us consider 3 rows from the basketball dataset:

script.py
wnba.csv
Output
Name Team Pos Height Weight BMI Birth_Place \ 0 Aerial Powers DAL F 183 71.0 21.200991 US 1 Alana Beard LA G/F 185 73.0 21.329438 US 2 Alex Bentley CON G 170 69.0 23.875433 US Birthdate Age College Experience Games Played MIN FGM \ 0 January 17, 1994 23 Michigan State 2 8 173 30 1 May 14, 1982 35 Duke 12 30 947 90 2 October 27, 1990 26 Penn State 4 26 617 82 FGA FG% 15:00 3PA 3P% FTM FTA FT% OREB DREB REB AST STL \ 0 85 35.3 12 32 37.5 21 26 80.8 6 22 28 12 3 1 177 50.8 5 18 27.8 32 41 78.0 19 82 101 72 63 2 218 37.6 19 64 29.7 35 42 83.3 4 36 40 78 22 BLK TO PTS DD2 TD3 0 6 12 93 0 0 1 13 40 217 0 0 2 3 24 218 0 0


The column names in a dataset represents properties of each item or an individual. In practice, we limit ourselves to the properties relevant to the questions we want to answer, and to the properties that we can actually measure. 

The properties with varying values we call variablesVariables in statistics can describe either quantities, or qualities.

Quantitative and categorical variables

QuantitativeGenerally, a variable that describes how much there is of something describes a quantity. Usually quantitative variables are real numbers but they can be words as well. 

For example, height "160 cms", or "tall", "short"

Categorical Usually, qualitative variables describe qualities using words, but numbers can also be used. 

For example, Name can be "Brindha" or just "9". Here 9 does not mean anything other than an identification number for Brindha. It does not bear any quantitative meaning.

Categorical variables are also called qualitative variables.


Dataset:
https://www.kaggle.com/jinxbe/wnba-player-stats-2017
Glossary:
https://www.basketball-reference.com/about/glossary.html


script.py
wnba.csv
Output
Name Team Pos Height Weight BMI Birth_Place \ 0 Aerial Powers DAL F 183 71.0 21.200991 US Birthdate Age College Experience Games Played MIN FGM \ 0 January 17, 1994 23 Michigan State 2 8 173 30 FGA FG% 15:00 3PA 3P% FTM FTA FT% OREB DREB REB AST STL \ 0 85 35.3 12 32 37.5 21 26 80.8 6 22 28 12 3 BLK TO PTS DD2 TD3 0 6 12 93 0 0


Scale of measurement

A system of measurement is made up of four different scales of measurement: nominal, ordinal, interval, and ratio. And the characteristics of each scale pivot around three main questions:
  • Can we tell whether two individuals are different?
  • Can we tell the direction of the difference?
  • Can we tell the size of the difference?



The nominal scale























































No comments: