Variables in Statistics
Data collection is the first step of data analysis. After collecting data for analysis, it is important to understand the structural parts of a dataset, and how they're measured.
Let us consider 3 rows from the basketball dataset:
Output
Name Team Pos Height Weight BMI Birth_Place \
0 Aerial Powers DAL F 183 71.0 21.200991 US
1 Alana Beard LA G/F 185 73.0 21.329438 US
2 Alex Bentley CON G 170 69.0 23.875433 US
Birthdate Age College Experience Games Played MIN FGM \
0 January 17, 1994 23 Michigan State 2 8 173 30
1 May 14, 1982 35 Duke 12 30 947 90
2 October 27, 1990 26 Penn State 4 26 617 82
FGA FG% 15:00 3PA 3P% FTM FTA FT% OREB DREB REB AST STL \
0 85 35.3 12 32 37.5 21 26 80.8 6 22 28 12 3
1 177 50.8 5 18 27.8 32 41 78.0 19 82 101 72 63
2 218 37.6 19 64 29.7 35 42 83.3 4 36 40 78 22
BLK TO PTS DD2 TD3
0 6 12 93 0 0
1 13 40 217 0 0
2 3 24 218 0 0
The column names in a dataset represents properties of each item or an individual. In practice, we limit ourselves to the properties relevant to the questions we want to answer, and to the properties that we can actually measure.
The properties with varying values we call variables. Variables in statistics can describe either quantities, or qualities.
Quantitative and categorical variables
Quantitative - Generally, a variable that describes how much there is of something describes a quantity. Usually quantitative variables are real numbers but they can be words as well.
For example, height "160 cms", or "tall", "short"
Categorical - Usually, qualitative variables describe qualities using words, but numbers can also be used.
For example, Name can be "Brindha" or just "9". Here 9 does not mean anything other than an identification number for Brindha. It does not bear any quantitative meaning.
Categorical variables are also called qualitative variables.
Dataset:
https://www.kaggle.com/jinxbe/wnba-player-stats-2017
Glossary:
https://www.basketball-reference.com/about/glossary.html
Output
Name Team Pos Height Weight BMI Birth_Place \
0 Aerial Powers DAL F 183 71.0 21.200991 US
Birthdate Age College Experience Games Played MIN FGM \
0 January 17, 1994 23 Michigan State 2 8 173 30
FGA FG% 15:00 3PA 3P% FTM FTA FT% OREB DREB REB AST STL \
0 85 35.3 12 32 37.5 21 26 80.8 6 22 28 12 3
BLK TO PTS DD2 TD3
0 6 12 93 0 0
Scale of measurement
A system of measurement is made up of four different scales of measurement: nominal, ordinal, interval, and ratio. And the characteristics of each scale pivot around three main questions:
- Can we tell whether two individuals are different?
- Can we tell the direction of the difference?
- Can we tell the size of the difference?
No comments:
Post a Comment