# Python: NumPy Basics

Numpy works in a lower level language and therefore has shorter computational times.

Deals with vectors and matrices.

Can be used to import and preprocess data into Python directly.

Main feature: N-D array class which corresponds to lists.

Matrix: A datastructure having rows and columns used mostly for mathematical operations. A collection of vectors.

Scalar: A single element. All the numbers from algebra are refered to as scalars in linear algebra.

Vector: Single dimensional object. Vector can be a column or row vector.

`#vector operations`

`np.append(vector,new_value/vector2) #add new values to the vector`

`vector[index] = new_value #update existing value`

`del v[0]…`

# Standard Normal Distribution and z-score/Z-Statistic

Any distribution can be standardized. Assume the population mean is µ and standard deviation is σ , a distribution is in it’s standard form when (µ,σ2) = (0,1).

Normal distribution can also be standardized using the formula

# z = (x — µ)/σ

Where z is the Z score. Consider the below normally distributed data points.

When every data point x is subtracted from the accurate representation of population mean i.e sample mean, we move the graph towards the origin and the µ becomes zero as shown below.

# MySQL Quick Reference (Syntax Version)

ACID properties:

Atomicity: All or None, if the transaction fails in between, it rolls back entirely.

Consistency: If the transaction has occurred between two parties, then it should reflect on both sides.

Isolation: If multiple transactions are running concurrently, they should not be affected by each other.

Durability: Hardware and Software failure must not cause the data loss during transaction.

DML: insert, update, delete, merge

DDL: Create, drop, truncate: deletes all the rows (where cannot be used), alter, rename.

DQL: select

DCL: grant, revoke

TCL: commit, rollback, savepoint- sets savepoint within transaction.

Operator order of precedence: SQL will always give…

# Measures of Variability — Range, Variance, Std. Deviation, Coefficient of Variation

Variability gives you the idea about how the data is distributed around the mean value in a data set. It gives you an idea how far the data is distributed. Variation exists in our daily lives, for example, the time we wake up every day varies over a range. Too much variability affects other events during the day and the outcome might not be favorable. Similarly if your favorite dish in a restaurant varies a lot you would not like that. This variation can be measured. Following is an example

Like the 3 measures of central tendency we have many…

# Central Limit Theorem (CLT)

Central Limit Theorem states that if we take multiple random samples out of a population the distribution of their sample means will always be Normal. In descriptive statistics only a single sample is taken from the population and calculations are made based on it. However the results are suboptimal and are far from the population statistics. Therefore, in CLT

1. We take multiple samples of size(n) > 30 from the population. (sample size must always be => 30 observations)
2. Calculate means of the sample.
3. The distribution of samples is called Sampling distribution and the distribution of sampling means is called sampling…

# Importing the data file

1. There are two methods to import a data file into python using pandas. You either specify the entire file path to the `pd.read_csv('file_path')` or you `import os os.chdir('pythonFolder')` and set the python working directory to the current folder so that the notebook by default searches the current directory for the data files. You can check the working directory by using `print(os.getcwd())`
2. `import pandas as pd` is the library we need.
3. `pd.read_csv()` is the most flexible and mostly used.
4. `pd.read_csv(sheet_name = "", skiprows = "", skipfooter= "")` are additional parameters that can be used.
5. To save a file we can use…

# Normal/Gaussian Distribution/Bell Curve

You can check my analytics projects on https://github.com/jay6445/Data-Analytics-Projects.git

Normal distribution is symmetric around the mean. In a sample of data points, there will be equal distribution of data points on either sides of the mean. Normal distribution helps us get rid of the outliers and makes inferential calculations much easier as it is faster to compare two normally distributed variables than two following different distributions which is very common in real world analysis. The process of converting a distribution into a normal distribution is called Normalization. One of the important characteristic of a normal distribution is that it has no…

# Types of Data and Measurements

Following is an employee table

When we eyeball the above table we see numbers, words, alphabets and dates. Two main parent types of data are called Numerical and Categorical data.

# Python: Working with Data

1. `file1 = open('file path' , 'mode')` is used to open the file you want to load.
2. Mode can be ‘r’ reading, ‘w’ writing or ‘a’ for appending that is if you want to use the existing file to write instead of creating new ones.
3. Always remember to close the file, `file1.close()`
4. You can use `file1.name()` to get the file name.
5. Using `with open('file path' , 'mode') as file1: fileContent = file1.read()` is a better practice as it automatically closes the file.
6. `fileContent = file1.readline()` reads each line of the file and stores it in a list

# Writing files with Open

1. `file1 = open('file path'…`

# Using Regex in Python

You can check out my projects here like chat analysis where I have used regex on my GIT: jay6445/Data-Analytics-Projects

1. Regular expression is useful for many projects including but not limited to web scraping, chat or email analysis to extract useful information from a large text file
2. Regex library in python makes our lives simple as there are certain symbols assigned to various characters to be used to form a pattern to excavate the information in the text.
3. `import re` this is the library we need.
4. If not present, install `pip install re` on Anaconda terminal.

# Symbols

1. `\d` matches the decimal digits…

## Jayesh Rao

Hi good to see y'all, I am an aspiring data analyst and will be posting stuff about Statistics, Python and R and also some interesting projects I do. B-)

Get the Medium app