Python: Working with Data
Reading files with Open
file1 = open('file path' , 'mode')is used to open the file you want to load.
- Mode can be ‘r’ reading, ‘w’ writing or ‘a’ for appending that is if you want to use the existing file to write instead of creating new ones.
- Always remember to close the file,
- You can use
file1.name()to get the file name.
with open('file path' , 'mode') as file1: fileContent = file1.read()is a better practice as it automatically closes the file.
fileContent = file1.readline()reads each line of the file and stores it in a list
Writing files with Open
file1 = open('file path' , 'w')is used to open the file in write mode.
file1.write("line to be written\\n")will write the line to the file.
- We can use a for loop to write the contents of one file to another by opening them in read and write modes respectively.
Loading and Saving data with Pandas
- Pandas is a useful and common library for Data Analysis.
- Manual loading of data from files as seen above can be a nightmare. That is the last resort when files do not have a pattern or so called spaghetti files.
import pandas as pdis the library we need.
- Loading a csv into Python is as simple as
- The data from the csv file is loaded into pandas data frame which is a table with column headers.
- We can create a data frame from a dictionary. The keys correspond to the column headers while the values are the row entries.
df = pd.DataFrame("dict1")
df[["columnName"]]can be used to select the needed column names to form a new data frame for analysis. We can omit out unwanted categorical data.
- numpy also provides
numpy.loadfromtxt()to load homogenous data structures like array from files.
numpy.genfromtxt()to load simple heterogenous files with column headers and different data types.
pd.read_csv()is the most flexible and mostly used.
iloc[row_no,col_no]are used with pandas dataframe for selection of parts of the data frame for analysis.
df.loc['0','columnName1']here loc uses row number and column names as inputs while
df.iloc[0,1]takes row and column number.
loc, iloccan also be used for slicing like for example
df['col_name'].unique()will display all the unique values of a column.
df[df[condition]]will display all the rows that match the condition, therefore obtaining a new data frame.
- To save a file we can use
We can have a discussion here https://www.linkedin.com/in/jayeshrao