# ggplot in Python-Part 2

Continuing with our study of diamond data, let us employ basic exploration function and see what information we can draw from it.

EXPLORE THE DATA:

##### 1. Length of the data:

Use the len() function to see the number of rows of data. ##### 2. Names of the columns:

Use the column() function. ##### 3. Analyse the first few values:

Use the head() function, by default first 5 values are displayed. ##### 4. Analyse the last few values:

Use the tail() function, by default last 5 values are displayed. ##### 5. Random selection:

To view data at random from a large data set. ##### 6. Statistical information:

Numeric fields can be evaluated by describe() function to present the statistical information of mean, median and range. ##### 7. Determine the correlation between fields:

corr() function determines the correlation of all numeric fields in the data set. ##### 8. Values stored in the non numeric fields:

We can simply view the values by using diamonds[‘color’] but, it has many repeated values, so better if we view the unique entries. Use unique() function for the same. ### Observations so far:

OBSERVATION 1: The data has both numeric and non numeric values.

OBSERVATION 2: The mean and medians of x,y are approximately same. Do diamonds have proportionate length/breadth?

OBSERVATION 3: min of x,y,z is 0. Can length/breadth/height of a real object=0?

OBSERVATION 4: Diagonal correlations are all 1. Why?

OBSERVATION 5: Price,carat,x,y,z seems closely related with each other. Can a predictive model be developed? What about non-numeric values?