ACID properties:
Atomicity: All or None, if the transaction fails in between, it rolls back entirely.
Consistency: If the transaction has occurred between two parties, then it should reflect on both sides.
Isolation: If multiple transactions are running concurrently, they should not be affected by each other.
Durability: Hardware and Software failure must not cause the data loss during transaction.
DML: insert, update, delete, merge
DDL: Create, drop, truncate: deletes all the rows (where cannot be used), alter, rename.
DQL: select
DCL: grant, revoke
TCL: commit, rollback, savepoint- sets savepoint within transaction.
Operator order of precedence: SQL will always give higher precedence to AND followed by OR. …
Variability gives you the idea about how the data is distributed around the mean value in a data set. It gives you an idea how far the data is distributed. Variation exists in our daily lives, for example, the time we wake up every day varies over a range. Too much variability affects other events during the day and the outcome might not be favorable. Similarly if your favorite dish in a restaurant varies a lot you would not like that. This variation can be measured. Following is an example
Like the 3 measures of central tendency we have many measures of variability…
Central Limit Theorem states that if we take multiple random samples out of a population the distribution of their sample means will always be Normal. In descriptive statistics only a single sample is taken from the population and calculations are made based on it. However the results are suboptimal and are far from the population statistics. Therefore, in CLT
pd.read_csv('file_path')
or you import os os.chdir('pythonFolder')
and set the python working directory to the current folder so that the notebook by default searches the current directory for the data files. You can check the working directory by using print(os.getcwd())
import pandas as pd
is the library we need.pd.read_csv()
is the most flexible and mostly used.pd.read_csv(sheet_name = "", skiprows = "", skipfooter= "")
are additional parameters that can be used.You can check my analytics projects on https://github.com/jay6445/Data-Analytics-Projects.git
Normal distribution is symmetric around the mean. In a sample of data points, there will be equal distribution of data points on either sides of the mean. Normal distribution helps us get rid of the outliers and makes inferential calculations much easier as it is faster to compare two normally distributed variables than two following different distributions which is very common in real world analysis. The process of converting a distribution into a normal distribution is called Normalization. One of the important characteristic of a normal distribution is that it has no skew which should be understood when we say the distribution is symmetric. In this distribution Mean = Median = Mode. …
Following is an employee table
When we eyeball the above table we see numbers, words, alphabets and dates. Two main parent types of data are called Numerical and Categorical data.
file1 = open('file path' , 'mode')
is used to open the file you want to load.file1.close()
file1.name()
to get the file name.with open('file path' , 'mode') as file1: fileContent = file1.read()
is a better practice as it automatically closes the file.fileContent = file1.readline()
reads each line of the file and stores it in a listfile1 = open('file path' , 'w')
is used to open the file in write mode. …You can check out my projects here like chat analysis where I have used regex on my GIT: jay6445/Data-Analytics-Projects
import re
this is the library we need.pip install re
on Anaconda terminal.\d
matches the decimal digits. \D
if not a decimal digit. …Take a look at the below advertising claim.
Nike claims that their Vaporfly shoes make a runner 4 percent more efficient, it claims a performance increase when you wear these pair of shoes. This is a big claim. But the main question is how was Nike able to make such a claim? On what basis? how was the research carried out? Oh, Nike must’ve asked a person to wear his regular shoes followed by these shoes and run equidistant laps and time differences were calculated right? NO, that would be a highly unreliable claim then. It is also not humanly possible to test these shoes on marathon runners across the globe. What Nike did is it used a random sample of the target population, amateur marathon runners, tested their product and calculated the mean, median of the time difference of the sample. Captured the latent measures of how people felt, drew correlations as to what will eventually make them efficient in the long run. No pun intended. …
You can check out my basic python notebooks and projects on https://github.com/jay6445/Learning-Analytics.git
(1, 2, 34, 55, 10, 12)
a = (1, 2, 34, 55, 10, 12)
then a[0] = 1
a[-1] = 12
likewise a[-6] = 1 i
n the above tuple.a = (1, 12, 34.6, 'a', 'Zebra')
(1 , 22, 34, 45) + ('Zebra', 'Cats') = (1, 22, 34, 45, 'Zebra', 'Cats')
a = (1, 2, 34, 55, 10…
About