Continuing with our study of diamond data, let us employ basic exploration function and see what information we can draw from it.
EXPLORE THE DATA:
1. Length of the data:
Use the len() function to see the number of rows of data.
2. Names of the columns:
Use the column() function.
3. Analyse the first few values:
Use the head() function, by default first 5 values are displayed.
4. Analyse the last few values:
Use the tail() function, by default last 5 values are displayed.
5. Random selection:
To view data at random from a large data set.
6. Statistical information:
Numeric fields can be evaluated by describe() function to present the statistical information of mean, median and range.
7. Determine the correlation between fields:
corr() function determines the correlation of all numeric fields in the data set.
8. Values stored in the non numeric fields:
We can simply view the values by using diamonds[‘color’] but, it has many repeated values, so better if we view the unique entries. Use unique() function for the same.
Observations so far:
OBSERVATION 1: The data has both numeric and non numeric values.
OBSERVATION 2: The mean and medians of x,y are approximately same. Do diamonds have proportionate length/breadth?
OBSERVATION 3: min of x,y,z is 0. Can length/breadth/height of a real object=0?
OBSERVATION 4: Diagonal correlations are all 1. Why?
OBSERVATION 5: Price,carat,x,y,z seems closely related with each other. Can a predictive model be developed? What about non-numeric values?