Qualitative vs Quantitative predictors /u/godshammer_86 Python Education

Hi everyone.

Apologies if this isn’t the best place to post this, I thought it’d be better than r/learnpython since its a bit more advanced of a question.

I’m working through Introduction to Statistical Learning with Python and currently on Chapter 2, Exercise 9. This exercise uses the Auto data set which has the following predictors:

mpg, cylinders, displacement, horsepower, weight, acceleration, year, origin, name

Part (a) of this question asks: *Which of the predictors are quantitative, and which are qualitative?*

I sorted them as follows:

quantitative: mpg, displacement, horsepower, weight, acceleration
qualitative: cylinders, year, origin, name

I then consulted some other peoples’ solutions online (as well as some Google searches) and found the following results:

Using df.select_dtypes(include=['number']).columns and df.select_dtypes(exclude=['number']).columns gave the answer that only “name” is qualitative; all others are quantitative.
Only “name” and “origin” are qualitative; all others are quantitative.
All variables except “horsepower” and “name” are quantitative.

And some Google searches stated that, for example, “year” is a quantitative predictor, not qualitative as I would expect.

Am I misunderstanding how to classify a predictor as either qualitative or quantitative?

In my mind, qualitative is more or less synonymous with categorical: there is a finite number of categories into which a value can be placed. It also helps me to think about whether the value is able/likely to change for a given observation. For example, ‘mpg’ is quantitative (in part) because it could easily change as the car is used; whereas a car’s model year or number of cylinders can’t change, so the cars can be sorted into discrete categories based on these characteristics.

By this understanding, I would think predictors such as cylinders (4-cyl, v6, v8) and year the car was manufactured (1970, 1971, 1972, etc.) would be qualitative/categorical.

Am I thinking about this wrong? Or is my solution a fairly accurate way of thinking?

submitted by /u/godshammer_86
[link] [comments]

r/learnpython Hi everyone. Apologies if this isn’t the best place to post this, I thought it’d be better than r/learnpython since its a bit more advanced of a question. I’m working through Introduction to Statistical Learning with Python and currently on Chapter 2, Exercise 9. This exercise uses the Auto data set which has the following predictors: mpg, cylinders, displacement, horsepower, weight, acceleration, year, origin, name Part (a) of this question asks: *Which of the predictors are quantitative, and which are qualitative?* I sorted them as follows: quantitative: mpg, displacement, horsepower, weight, acceleration qualitative: cylinders, year, origin, name I then consulted some other peoples’ solutions online (as well as some Google searches) and found the following results: Using df.select_dtypes(include=[‘number’]).columns and df.select_dtypes(exclude=[‘number’]).columns gave the answer that only “name” is qualitative; all others are quantitative. Only “name” and “origin” are qualitative; all others are quantitative. All variables except “horsepower” and “name” are quantitative. And some Google searches stated that, for example, “year” is a quantitative predictor, not qualitative as I would expect. Am I misunderstanding how to classify a predictor as either qualitative or quantitative? In my mind, qualitative is more or less synonymous with categorical: there is a finite number of categories into which a value can be placed. It also helps me to think about whether the value is able/likely to change for a given observation. For example, ‘mpg’ is quantitative (in part) because it could easily change as the car is used; whereas a car’s model year or number of cylinders can’t change, so the cars can be sorted into discrete categories based on these characteristics. By this understanding, I would think predictors such as cylinders (4-cyl, v6, v8) and year the car was manufactured (1970, 1971, 1972, etc.) would be qualitative/categorical. Am I thinking about this wrong? Or is my solution a fairly accurate way of thinking? submitted by /u/godshammer_86 [link] [comments]