Numpy works in a lower level language and therefore has shorter computational times.
Deals with vectors and matrices.
Can be used to import and preprocess data into Python directly.
Main feature: N-D array class which corresponds to lists.
Matrix: A datastructure having rows and columns used mostly for mathematical operations. A collection of vectors.
Scalar: A single element. All the numbers from algebra are refered to as scalars in linear algebra.
Vector: Single dimensional object. Vector can be a column or row vector.
np.append(vector,new_value/vector2) #add new values to the vector
vector[index] = new_value #update existing value
Any distribution can be standardized. Assume the population mean is µ and standard deviation is σ , a distribution is in it’s standard form when (µ,σ2) = (0,1).
Normal distribution can also be standardized using the formula
Where z is the Z score. Consider the below normally distributed data points.
When every data point x is subtracted from the accurate representation of population mean i.e sample mean, we move the graph towards the origin and the µ becomes zero as shown below.
Atomicity: All or None, if the transaction fails in between, it rolls back entirely.
Consistency: If the transaction has occurred between two parties, then it should reflect on both sides.
Isolation: If multiple transactions are running concurrently, they should not be affected by each other.
Durability: Hardware and Software failure must not cause the data loss during transaction.
DML: insert, update, delete, merge
DDL: Create, drop, truncate: deletes all the rows (where cannot be used), alter, rename.
DCL: grant, revoke
TCL: commit, rollback, savepoint- sets savepoint within transaction.
Operator order of precedence: SQL will always give…
Variability gives you the idea about how the data is distributed around the mean value in a data set. It gives you an idea how far the data is distributed. Variation exists in our daily lives, for example, the time we wake up every day varies over a range. Too much variability affects other events during the day and the outcome might not be favorable. Similarly if your favorite dish in a restaurant varies a lot you would not like that. This variation can be measured. Following is an example
Like the 3 measures of central tendency we have many…
Central Limit Theorem states that if we take multiple random samples out of a population the distribution of their sample means will always be Normal. In descriptive statistics only a single sample is taken from the population and calculations are made based on it. However the results are suboptimal and are far from the population statistics. Therefore, in CLT
import os os.chdir('pythonFolder')and set the python working directory to the current folder so that the notebook by default searches the current directory for the data files. You can check the working directory by using
import pandas as pdis the library we need.
pd.read_csv()is the most flexible and mostly used.
pd.read_csv(sheet_name = "", skiprows = "", skipfooter= "")are additional parameters that can be used.
You can check my analytics projects on https://github.com/jay6445/Data-Analytics-Projects.git
Normal distribution is symmetric around the mean. In a sample of data points, there will be equal distribution of data points on either sides of the mean. Normal distribution helps us get rid of the outliers and makes inferential calculations much easier as it is faster to compare two normally distributed variables than two following different distributions which is very common in real world analysis. The process of converting a distribution into a normal distribution is called Normalization. One of the important characteristic of a normal distribution is that it has no…
file1 = open('file path' , 'mode')is used to open the file you want to load.
file1.name()to get the file name.
with open('file path' , 'mode') as file1: fileContent = file1.read()is a better practice as it automatically closes the file.
fileContent = file1.readline()reads each line of the file and stores it in a list
file1 = open('file path'…
You can check out my projects here like chat analysis where I have used regex on my GIT: jay6445/Data-Analytics-Projects
import rethis is the library we need.
pip install reon Anaconda terminal.
\dmatches the decimal digits…