Python: Working with Data
2 min readSep 30, 2020
Reading files with Open
file1 = open('file path' , 'mode')
is used to open the file you want to load.- Mode can be ‘r’ reading, ‘w’ writing or ‘a’ for appending that is if you want to use the existing file to write instead of creating new ones.
- Always remember to close the file,
file1.close()
- You can use
file1.name()
to get the file name. - Using
with open('file path' , 'mode') as file1: fileContent = file1.read()
is a better practice as it automatically closes the file. fileContent = file1.readline()
reads each line of the file and stores it in a list

Writing files with Open
file1 = open('file path' , 'w')
is used to open the file in write mode.file1.write("line to be written\\n")
will write the line to the file.- We can use a for loop to write the contents of one file to another by opening them in read and write modes respectively.

Loading and Saving data with Pandas
- Pandas is a useful and common library for Data Analysis.
- Manual loading of data from files as seen above can be a nightmare. That is the last resort when files do not have a pattern or so called spaghetti files.
import pandas as pd
is the library we need.- Loading a csv into Python is as simple as
df=pd.read_csv("path_to_csv")
- The data from the csv file is loaded into pandas data frame which is a table with column headers.
- We can create a data frame from a dictionary. The keys correspond to the column headers while the values are the row entries.
df = pd.DataFrame("dict1")
df[["columnName"]]
can be used to select the needed column names to form a new data frame for analysis. We can omit out unwanted categorical data.- numpy also provides
numpy.loadfromtxt()
to load homogenous data structures like array from files. numpy.genfromtxt()
to load simple heterogenous files with column headers and different data types.pd.read_csv()
is the most flexible and mostly used.loc[row_no, col_name]
iloc[row_no,col_no]
are used with pandas dataframe for selection of parts of the data frame for analysis.df.loc['0','columnName1']
here loc uses row number and column names as inputs whiledf.iloc[0,1]
takes row and column number.loc, iloc
can also be used for slicing like for exampleiloc[1:2,0:3]
df['col_name'].unique()
will display all the unique values of a column.df[df[condition]]
will display all the rows that match the condition, therefore obtaining a new data frame.- To save a file we can use
pd.to_csv("fileName.csv")
We can have a discussion here https://www.linkedin.com/in/jayeshrao