Basics of Matplotlib

Written as a part of ML101 Teaching_ML Data Visualization forms the crux of data modeling. We use data visualization to explore the data before the modeling step, and then again to finally present the model in a graphical form to a non technical audience. There are numerous visualization libraries and tools available for Python or […]

String Manipulation in R

A data scientist loves numbers but most of the human data is combination of characters.  Let me give you a small example, Price of Camera1= $5000 Price of Camera2= 5,000 Price of Camera3= 5000 For you and me, all the values are same, their average being 5000, but for a computer the first 2 values […]

ggplot in python-part 5

The main factor affecting price is the carat. In this post we shall evaluate how the two factors fare. PRICE VS CARATS OBSERVATION: As the value of carat increases, the price goes up. The line of regression is quadratic. Thus, price is affected by carat including other factors as well. OBSERVATION: Upon further scaling, we […]

ggplot error in legends.

My ggplot data evaluation was going smoothly, until I came across the an error in my ggplots. When I tried to use to the factors of cut, clarity, and color  as a differentiating factor in price vs volume, the legends did not show up. I had a colored and segmented graph but it looked vague. […]

ggplot in Python- Part 4

Diamonds are costly, and their value is affected by various qualitative and quantitative.  In this post we will try to evaluate some of the factors that contribute in making it costly. Check the density of diamond PRICE VS LENGTH PRICE VS BREADTH PRICE VS HEIGHT PRICE VS VOLUME PRICE VS DEPTH PRICE VS TABLE Observations […]

ggplot in Python- Part 3

Before we plotting our data, we must be aware of any null values in the qualitative fields and unnecessary zero values in the numeric fields. Such values can lead to incorrect statistical calculations and even worse, errors while plotting the values and forming the line of best fit. 1.Check for any null values. 2. Check […]

ggplot in Python-Part 2

Continuing with our study of diamond data, let us employ basic exploration function and see what information we can draw from it. EXPLORE THE DATA: 1. Length of the data: Use the len() function to see the number of rows of data. 2. Names of the columns: Use the column() function. 3. Analyse the first […]