Numpy works in a lower level language and therefore has shorter computational times.

Deals with vectors and matrices.

Can be used to import and preprocess data into Python directly.

Main feature: N-D array class which corresponds to lists.

Matrix: A datastructure having rows and columns used mostly for mathematical operations…


Any distribution can be standardized. Assume the population mean is µ and standard deviation is σ , a distribution is in it’s standard form when (µ,σ2) = (0,1).

Normal distribution can also be standardized using the formula

z = (x — µ)/σ

Where z is the Z score. Consider the below normally distributed data points.

Normal distribution depicted by blue curve


ACID properties:

Atomicity: All or None, if the transaction fails in between, it rolls back entirely.

Consistency: If the transaction has occurred between two parties, then it should reflect on both sides.

Isolation: If multiple transactions are running concurrently, they should not be affected by each other.

Durability: Hardware and…


Variability gives you the idea about how the data is distributed around the mean value in a data set. It gives you an idea how far the data is distributed. Variation exists in our daily lives, for example, the time we wake up every day varies over a range. Too…


Central Limit Theorem states that if we take multiple random samples out of a population the distribution of their sample means will always be Normal. In descriptive statistics only a single sample is taken from the population and calculations are made based on it. However the results are suboptimal and…


Importing the data file

  1. There are two methods to import a data file into python using pandas. You either specify the entire file path to the pd.read_csv('file_path') or you import os os.chdir('pythonFolder') and set the python working directory to the current folder so that the notebook by default searches the current directory for the…


You can check my analytics projects on https://github.com/jay6445/Data-Analytics-Projects.git

Normal distribution is symmetric around the mean. In a sample of data points, there will be equal distribution of data points on either sides of the mean. Normal distribution helps us get rid of the outliers and makes inferential calculations much easier…


Following is an employee table

When we eyeball the above table we see numbers, words, alphabets and dates. Two main parent types of data are called Numerical and Categorical data.


Reading files with Open

  1. file1 = open('file path' , 'mode') is used to open the file you want to load.
  2. Mode can be ‘r’ reading, ‘w’ writing or ‘a’ for appending that is if you want to use the existing file to write instead of creating new ones.
  3. Always remember to close the file…

You can check out my projects here like chat analysis where I have used regex on my GIT: jay6445/Data-Analytics-Projects

  1. Regular expression is useful for many projects including but not limited to web scraping, chat or email analysis to extract useful information from a large text file
  2. Regex library in python…

Jayesh Rao

Hi good to see y'all, I am an aspiring data analyst and will be posting stuff about Statistics, Python and R and also some interesting projects I do. B-)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store