Data Handling using Pandas

PrevNextBack

Data Handling using Pandas

Fill in the Blanks

Question 1

Pandas is the most popular open-source Python library used for doing data analysis.

Question 2

In order to work with Pandas in Python, you need to import pandas library in your program.

Question 3

The two basic and universally popular data structures of Pandas are Series and DataFrame.

Question 4

A Series is a Pandas data structure that represents a one-dimensional array-like object of indexed data.

Question 5

To create a series object, Series() method is used.

Question 6

Missing data in Pandas series and DataFrames can be filled with a NaN-Not a Number value.

Question 7

DataFrame has two indices.

Question 8

Selecting a subset from a DataFrame requires loc and iloc functions.

Question 9

read_csv() function is used to read data from a CSV file in your DataFrame.

Question 10

to_csv() function saves the data of DataFrame to a CSV file.

State True or False

Question 1

A series object is 2D array that stores ordered collection columns that can store data of different type.

Answer

False

Reason — A DataFrame object is 2D labelled array like Pandas data structure that stores an ordered collection columns that can store data of different types.

Question 2

A DataFrame is a 1D array-like object containing an array of data and an associated array of data labels.

Answer

False

Reason — A Series is a Pandas data structure that represents a one-dimensional array like object containing an array of data and an associated array of data labels, called its index.

Question 3

To access subset of a dataframe, we can use loc[] and iloc[] attributes.

Answer

True

Reason — To access a subset of a DataFrame in pandas, we can utilize the loc[] and iloc[] attributes, where loc[] is label-based and iloc[] is integer index-based.

Question 4

The iteritems() iterate over vertical subset in the form of (col_index,series) pair.

Answer

True

Reason — The iteritems() method in pandas iterates over a DataFrame column-wise, where each vertical subset is in the form of (column-index, Series) where Series contains all row values for that column-index.

Question 5

The value NA/NAT/None are the same in Pandas and considered as NaN-values.

Answer

True

Reason — In Pandas, 'NA' (Not Available) and 'NAT' (Not A Time) are used to represent missing values in numeric and datetime columns, respectively. 'None' is used to represent missing values in object columns. Although they are used for different data types, they are all considered as NaN (Not a Number) values.

Question 6

The iteritems() brings horizontal subsets from a dataframe.

Answer

False

Reason — The iteritems() method in pandas iterates over a DataFrame column-wise, where each vertical subset is in the form of (column-index, Series) where Series contains all row values for that column-index.

Question 7

The all() and any() functions are used to check if all or any item is non-zero, not-empty or not-False.

Answer

True

Reason — The all() and any() functions are used to check if all or any item is non-zero, not-empty or not-False.

Question 8

CSV refers to tabular data saved as plain text where data values are separated by commas.

Answer

True

Reason — CSV stands for Comma Separated Values. It is a file format used to store tabular data, such as tables or spreadsheets, as plain text. In a CSV file, each line represents a single row of data, and each value in that row is separated from the next value by a comma (,).

Question 9

read_csv() method automatically takes the last row of the CSV file.

Answer

False

Reason — The read_csv() method in pandas does not automatically take the last row of the CSV file. Instead, it reads the entire CSV file into a DataFrame.

Question 10

Data and index in an ndarray must be of the same length.

Answer

True

Reason — In a pandas DataFrame or Series, the data and index should be of the same length. Each element in the data should have a corresponding label in the index.

Question 11

We need to define an index in Pandas.

Answer

False

Reason — In Pandas, an index is automatically created when a DataFrame or Series is created. By default, the index is a range of integers starting from 0, incrementing by 1 for each row.

Multiple Choice Questions

Question 1

Which of the following commands is used to install Pandas?

  1. pip install python-pandas
  2. pip install pandas
  3. python install python
  4. python install pandas

Answer

pip install pandas

Reason — The correct command to install Pandas is pip install pandas.

Question 2

A two-dimensional labelled array that is an ordered collection of columns to store heterogeneous data types is:

  1. Series
  2. NumPy array
  3. Dataframe
  4. Panel

Answer

Dataframe

Reason — A DataFrame object is 2D labelled array like Pandas data structure that stores an ordered collection of columns that can store heterogeneous data types.

Question 3

In a dataframe, axis = 0 is for

  1. Columns
  2. Rows
  3. Rows and Columns both
  4. None of these

Answer

Rows

Reason — The axis 0 identifies a DataFrame's row index.

Question 4

Which of the following statements is false?

  1. Dataframe is size-mutable.
  2. Dataframe is values-mutable.
  3. Dataframe is immutable.
  4. Dataframe is capable of holding multiple types of data.

Answer

Dataframe is immutable.

Reason — DataFrame is value-mutable, size-mutable and stores heterogeneous data types.

Question 5

Which attribute of dataframe is used to perform the transpose operation on a dataframe?

  1. T
  2. Ndim
  3. Empty
  4. Shape

Answer

T

Reason — The T attribute of DataFrame is used to perform transpose operation on a DataFrame. The syntax is : <DataFrame>.T.

Question 6

Which attribute of dataframe is used to retrieve its shape?

  1. T
  2. Ndim
  3. Empty
  4. Shape

Answer

Shape

Reason — The shape attribute is used to retrieve DataFrame's shape.

Question 7

Which attribute of dataframe is used to get number of axes?

  1. T
  2. Ndim
  3. Empty
  4. Shape

Answer

Ndim

Reason — The ndim attribute of dataframe is used to get number of axes/array dimensions.

Question 8

CSV stands for:

  1. Comma Separated Values
  2. Comma Separated Variables
  3. Column Separated Values
  4. Column Separated Variables

Answer

Comma Separated Values

Reason — The acronym CSV is short for Comma-Separated Value.

Question 9

Which of the following can be used to specify the data while creating a dataframe?

  1. Series
  2. List of Dictionaries
  3. Structured ndarray
  4. All of these

Answer

All of these

Reason — We can create a DataFrame object by passing data in many different ways, such as two-dimensional dictionaries (i.e., dictionaries having lists or dictionaries or ndarrays or series objects etc), two-dimensional ndarrays, series type object and another DataFrame object.

Question 10

Which of the following commands shows the information with city="Delhi" from dataframe SHOP?

  1. print(SHOP[City == 'Delhi'])
  2. print(SHOP[SHOP.City == 'Delhi])
  3. print(SHOP[SHOP.'City' == 'Delhi'])
  4. print(SHOP[SHOP[City] == 'Delhi'])

Answer

print(SHOP[SHOP.City == 'Delhi'])

Reason — The correct code print(SHOP[SHOP.City == 'Delhi']) filters the SHOP DataFrame to show only the rows where the City column is equal to 'Delhi'. It does this by creating a boolean mask SHOP.City == 'Delhi' that returns True for rows where the city is 'Delhi' and False otherwise, and then using this mask to select the corresponding rows from the original DataFrame using SHOP[]. The resulting DataFrame, which contains only the rows that match the condition, is then printed to the console.

Question 11

The following statement will ............... .

df = df.drop(['Name', 'Class', 'Rollno'], axis = 1) #df is a DataFrame object
  1. delete three columns having labels 'Name', 'Class' and `Rollno'
  2. delete three rows having labels 'Name', 'Class' and 'Rollno'
  3. delete any three columns
  4. return error

Answer

delete three columns having labels 'Name', 'Class' and `Rollno'

Reason — The drop() function is used to remove rows from a DataFrame. In this case, the axis=1 parameter specifies that we want to drop columns. The list ['Name', 'Class', 'Rollno'] contains the labels of the columns to be dropped. Therefore, the statement will delete the three columns with labels 'Name', 'Class', and 'Rollno' from the DataFrame df.

Assertions and Reasons

Question 1

Assertion (A): Pandas is a Python library.

Reasoning (R): Pandas is a powerful, flexible and easy to use open source data analysis library.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
Pandas is a Python library that makes data analysis easy and effective. It is the most famous Python library for data science, offering powerful and flexible data structures that facilitate data analysis and manipulation. As an open-source library, Pandas provides high-performance, easy-to-use data structures and data analysis tools.

Question 2

Assertion (A): A series stores data row-wise.

Reasoning (R): A series is a one-dimensional labelled data structure.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is false but R is true.

Explanation
A series in pandas does not store data row-wise. It is a one-dimensional array like object containing an array of data and an associated array of data labels, called its index.

Question 3

Assertion (A): Dataframe has both a row and column index.

Reasoning (R): Dataframe is a two-dimensional labelled data structure like a table of MySQL.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
A DataFrame in Pandas has both a row index and a column index. It is a two-dimensional labeled data structure, similar to a table in MySQL, each value is identifiable with the combination of row and column indices.

Question 4

Assertion (A): While creating a series using scalar values, index must be provided.

Reasoning (R): The scalar value is repeated to match the length of index.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
When creating a Series using scalar values, we must provide an index. This is because a Series requires an index to associate with each value. This scalar value shall be repeated to match the length of the index.

Question 5

Assertion (A): The shape attribute returns the number of rows and number of columns available in dataframe.

Reasoning (R): The shape attribute returns the values in the form of list.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is true but R is false.

Explanation
The shape attribute of a Pandas DataFrame returns the number of rows and the number of columns in the DataFrame. The shape attribute returns the values in the form of tuple.

Question 6

Assertion (A): After running the following code:

df = pd.DataFrame([11,46], index = ['True', 'False'])
print(df[True])

A key error will be produced.

Reasoning (R): Dataframe does not support Boolean Indexing.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is true but R is false.

Explanation
DataFrames do support Boolean Indexing, which allows to select rows based on a Boolean condition. The code df[True] is trying to access a column named True, which does not exist in the DataFrame. The index of the DataFrame is ['True', 'False']. To access the row where the index is 'True', we would use df.loc['True']. This is an example of label-based indexing, where we are selecting a row based on its index label.

Question 7

Assertion (A): We can add a new column in an existing dataframe using .at or .loc methods.

Reasoning (R): When we reassign new values to an existing column in a dataframe, the previous values are overwritten.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true but R is not the correct explanation of A.

Explanation
We can add a new column in an existing dataframe using at or loc methods. When we reassign new values to an existing column in a dataframe, the previous values are overwritten.

Question 8

Assertion (A): When a dataframe is created using dictionary, the keys will be the columns and corresponding values will be the rows of the dataframe.

Reasoning (R): NaN values are automatically filled for values of non-matching keys while creating a dataframe using dictionary.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
When a dataframe is created using a dictionary, the keys of the dictionary become the columns of the dataframe, and the values of the dictionary become the rows of the dataframe. If the dictionaries used to create the dataframe have non-matching keys, NaN values will be automatically filled in for the missing values.

Case/Source Based Questions

Question 1

Inayra is writing a program using Pandas library for creating a dataframe from two dictionaries. Given below is the snippet she has developed. Help her to complete it by selecting the correct option given as under:

import ............... as pd            # Statement 1
dict = {'x': [10,25], 'y' : [32,93] }
dict 1 = {'x',: [14,56], 'y': [36,72] } 
df = pd.DataFrame (dict)
df1 = pd. ............... (dict)        # Statement 2
df2 = pd. ............... ([df, df1]) # Statement 3

1. Choose the correct option from the following for Statement 1.

    (a) pd
    (b) data
    (c) df
    (d) pandas

2. Which of the following option should be taken for Statement 2?

    (a) Series
    (b) Dataframe
    (c) DataFrame
    (d) Dictionary

3. Select the correct method from the following for Statement 3.

    (a) concat()
    (b) shape()
    (c) index()
    (d) append()

Answer

1. pandas

Reason — The pandas library is imported as pd using the statement import pandas as pd.

2. DataFrame

Reason — The pd.DataFrame() function is used to create a DataFrame from a dictionary.

3. concat()

Reason — The pd.concat() function is used to concatenate two or more DataFrames into a single DataFrame. In this case, it is used to concatenate df and df1 into a single DataFrame df2.

Solutions to Unsolved Questions

Question 1

What is the significance of Pandas Library?

Answer

The significance of Python Pandas library is as follows:

  1. It can read or write in many different data formats (integer, float, double, etc.).
  2. It can calculate in all the possible ways data is organized i.e., across rows and down columns.
  3. It can easily select subsets of data from bulky data sets and even combine multiple datasets together. It has functionality to find and fill missing data.
  4. It allows to apply operations to independent groups within the data.
  5. It supports reshaping of data into different forms.
  6. It supports advanced time-series functionality.
  7. It supports visualization by integrating matplotlib and seaborn etc. libraries.

Question 2

Name some common data structures of Python's Pandas Library.

Answer

The common data structures of Python's Pandas library are Series and DataFrame.

Question 3

Name the function to iterate over a dataframe horizontally.

Answer

The iterrows() function iterates over a DataFrame horizontally.

Question 4

Name the function to iterate over a dataframe vertically.

Answer

The iteritems() function iterates over a DataFrame vertically.

Question 5

What is CSV file?

Answer

CSV (Comma Separated Values) files are delimited files that store tabular data (data stored in rows and columns as we see in spreadsheets or databases) where comma delimits every value. Each line in a CSV file is a data record. Each record consists of one or more fields, separated by commas (or the chosen delimiter).

Question 6

What is the use of nrows argument in read_csv() method?

Answer

The nrows argument in the read_csv() method is used to specify the number of rows to read from the CSV file.

Question 7

How can we create CSV file? Explain with steps.

Answer

The steps to create CSV files are as follows:

  1. Launch Microsoft Excel.
  2. Type the data given in Table in the Excel sheet.
  3. Save the file with a proper name by clicking File -> Save or Save As or press Ctrl + S to open the Save As window.
  4. Type the name of the file and select file type as CSV from the drop-down arrow.
  5. Click the Save button. Excel will ask for confirmation to select CSV format.
  6. Click OK.
  7. It will display a dialog box asking permission to keep comma as delimiter for CSV file.
  8. Lastly, click Yes to retain and save the Excel file in CSV format.

Question 8

How do you iterate over a dataframe? Explain with the help of code snippet.

Answer

Iterating over rows in the DataFrame:

The iterrows() method is used to iterate over each row in the DataFrame. In this method, each horizontal subset is in the form of (row-index, series), where the series contains all column values for that row-index.

For example :

import pandas as pd
total_sales = {2015 : {'Qtr1' : 34500, 'Qtr2' : 45000}, 
               2016 : {'Qtr1' : 44500, 'Qtr2' : 65000}}
df = pd.DataFrame(total_sales)
for (row, rowseries) in df.iterrows():
    print("RowIndex :", row)
    print('Containing :')
    print(rowseries)
Output
RowIndex : Qtr1
Containing :
2015    34500
2016    44500
Name: Qtr1, dtype: int64
RowIndex : Qtr2
Containing :
2015    45000
2016    65000
Name: Qtr2, dtype: int64

Iterating over columns in the DataFrame:

The iteritems() method is used to iterate over each column in the DataFrame. In this method, each vertical subset is in the form of (column-index, series), where the series contains all row values for that column-index.

For example :

import pandas as pd
total_sales = {2015 : {'Qtr1' : 34500, 'Qtr2' : 45000}, 
               2016 : {'Qtr1' : 44500, 'Qtr2' : 65000}}
df = pd.DataFrame(total_sales)
for (col, colseries) in df.iteritems():
    print("Column Index :", col)
    print('Containing :')
    print(colseries)
Output
Column Index : 2015
Containing :
Qtr1    34500
Qtr2    45000
Name: 2015, dtype: int64
Column Index : 2016
Containing :
Qtr1    44500
Qtr2    65000
Name: 2016, dtype: int64

Question 9

Write commands to print the following details of a series object seal.

(a) If the series is empty

(b) Indexes of the series

(c) The data type of underlying data

(d) If the series stores any NaN values

Answer

(a)

seal.empty

(b)

seal.index

(c)

seal.dtype

(d)

seal.hasnans

Question 10

How do you fill all missing values with previous non-missing values?

Answer

To fill all missing values in a pandas DataFrame with the previous non-missing values, we can use the fillna method with the ffill parameter. The syntax is df.fillna(method = 'ffill', inplace = True)

Question 11

Consider the following tables Item and Customer and answer the questions that follow:

Table: Item

Item_IDItemNameManufacturerPrice
PC01Personal ComputerHCL India42000
LCO5LaptopHP USA55000
PCO3Personal ComputerDell USA32000
PC06Personal ComputerZenith USA37000
LCO3LaptopDell USA57000

Table: Customer

Item_IDCustomerNameCity
LCO3N RoyDelhi
PCO3H SinghMumbai
PC06R PandeyDelhi
LCO3C SharmaChennai
PC01K AgarwalBengaluru

Assume that the Pandas has been imported as pd.

(a) Create a dataframe called dfI for table Item.

(b) Create a dataframe called dfC for table Customer.

(c) Perform the default join operation on item_ID using two dataframes: dfI and dfC.

(d) Perform the left join operation on item_ID using two dataframes: dfI and dfC.

(e) Perform the right join operation on Item_ID using two dataframes: dfI and dfC.

(f) Perform the default operation on Item_ID using two dataframes: dfI and dfC with the left index as true.

(g) Perform the outer join operation on item_ID using two dataframes: dfI and dfC.

(h) Create a new dataframe dfN using dataframes: dfI and dfC. The new dataframe data will hold both left index and right index true values.

(i) Arrange the dataframe dfN in descending order of Price.

(j) Arrange the dataframe dfN in descending order of City and Price.

Answer

(a)

dfI = pd.DataFrame({
    'Item_ID': ['PC01', 'LCO5', 'PCO3', 'PC06', 'LCO3'],
    'ItemName': ['Personal Computer', 'Laptop', 'Personal Computer', 'Personal Computer', 'Laptop'],
    'Manufacturer': ['HCL India', 'HP USA', 'Dell USA', 'Zenith USA', 'Dell USA'],
    'Price': [42000, 55000, 32000, 37000, 57000]
})
Output
  Item_ID           ItemName Manufacturer  Price
0    PC01  Personal Computer    HCL India  42000
1    LCO5             Laptop       HP USA  55000
2    PCO3  Personal Computer     Dell USA  32000
3    PC06  Personal Computer   Zenith USA  37000
4    LCO3             Laptop     Dell USA  57000

(b)

dfC = pd.DataFrame({
    'Item_ID': ['LCO3', 'PCO3', 'PC06', 'LCO3', 'PC01'],
    'CustomerName': ['N Roy', 'H Singh', 'R Pandey', 'C Sharma', 'K Agarwal'],
    'City': ['Delhi', 'Mumbai', 'Delhi', 'Chennai', 'Bengaluru']
})
Output
  Item_ID CustomerName       City
0    LCO3        N Roy      Delhi
1    PCO3      H Singh     Mumbai
2    PC06     R Pandey      Delhi
3    LCO3     C Sharma    Chennai
4    PC01    K Agarwal  Bengaluru

(c)

df_default_join = pd.merge(dfI, dfC, on = 'Item_ID')
Output
  Item_ID           ItemName Manufacturer  Price CustomerName       City
0    PC01  Personal Computer    HCL India  42000    K Agarwal  Bengaluru
1    PCO3  Personal Computer     Dell USA  32000      H Singh     Mumbai
2    PC06  Personal Computer   Zenith USA  37000     R Pandey      Delhi
3    LCO3             Laptop     Dell USA  57000        N Roy      Delhi
4    LCO3             Laptop     Dell USA  57000     C Sharma    Chennai

(d)

df_left_join = pd.merge(dfI, dfC, left_on='Item_ID', right_on='Item_ID', how='left')
Output
  Item_ID           ItemName Manufacturer  Price CustomerName       City
0    PC01  Personal Computer    HCL India  42000    K Agarwal  Bengaluru
1    LCO5             Laptop       HP USA  55000          NaN        NaN
2    PCO3  Personal Computer     Dell USA  32000      H Singh     Mumbai
3    PC06  Personal Computer   Zenith USA  37000     R Pandey      Delhi
4    LCO3             Laptop     Dell USA  57000        N Roy      Delhi
5    LCO3             Laptop     Dell USA  57000     C Sharma    Chennai

(e)

df_right_join = pd.merge(dfI, dfC, left_on='Item_ID', right_on='Item_ID', how='right')
Output
  Item_ID           ItemName Manufacturer  Price CustomerName       City
0    LCO3             Laptop     Dell USA  57000        N Roy      Delhi
1    PCO3  Personal Computer     Dell USA  32000      H Singh     Mumbai
2    PC06  Personal Computer   Zenith USA  37000     R Pandey      Delhi
3    LCO3             Laptop     Dell USA  57000     C Sharma    Chennai
4    PC01  Personal Computer    HCL India  42000    K Agarwal  Bengaluru

(f) This operation is not possible because the left_index parameter is used for merging on the index of the dataframes, not on a specific column. The default merge operation is an inner join, and it requires a common column to merge on.

(g)

df_outer_join = pd.merge(dfI, dfC, on='Item_ID', how='outer')
Output
  Item_ID           ItemName Manufacturer  Price CustomerName       City
0    LCO3             Laptop     Dell USA  57000        N Roy      Delhi
1    LCO3             Laptop     Dell USA  57000     C Sharma    Chennai
2    LCO5             Laptop       HP USA  55000          NaN        NaN
3    PC01  Personal Computer    HCL India  42000    K Agarwal  Bengaluru
4    PC06  Personal Computer   Zenith USA  37000     R Pandey      Delhi
5    PCO3  Personal Computer     Dell USA  32000      H Singh     Mumbai

(h)

dfN = pd.merge(dfI, dfC, left_index=True, right_index=True)
Output
Item_ID_x           ItemName Manufacturer  Price Item_ID_y CustomerName       City
0      PC01  Personal Computer    HCL India  42000      LCO3        N Roy      Delhi
1      LCO5             Laptop       HP USA  55000      PCO3      H Singh     Mumbai
2      PCO3  Personal Computer     Dell USA  32000      PC06     R Pandey      Delhi
3      PC06  Personal Computer   Zenith USA  37000      LCO3     C Sharma    Chennai
4      LCO3             Laptop     Dell USA  57000      PC01    K Agarwal  Bengaluru

(i)

dfN_sorted_price = dfN.sort_values(by='Price', ascending=False)
Output
  Item_ID_x           ItemName Manufacturer  Price Item_ID_y CustomerName       City
4      LCO3             Laptop     Dell USA  57000      PC01    K Agarwal  Bengaluru
1      LCO5             Laptop       HP USA  55000      PCO3      H Singh     Mumbai
0      PC01  Personal Computer    HCL India  42000      LCO3        N Roy      Delhi
3      PC06  Personal Computer   Zenith USA  37000      LCO3     C Sharma    Chennai
2      PCO3  Personal Computer     Dell USA  32000      PC06     R Pandey      Delhi

(j)

dfN_sorted_city_price = dfN.sort_values(by=['City', 'Price'], ascending=[False, False])
Output
  Item_ID_x           ItemName Manufacturer  Price Item_ID_y CustomerName       City
1      LCO5             Laptop       HP USA  55000      PCO3      H Singh     Mumbai
0      PC01  Personal Computer    HCL India  42000      LCO3        N Roy      Delhi
2      PCO3  Personal Computer     Dell USA  32000      PC06     R Pandey      Delhi
3      PC06  Personal Computer   Zenith USA  37000      LCO3     C Sharma    Chennai
4      LCO3             Laptop     Dell USA  57000      PC01    K Agarwal  Bengaluru

Question 12

Consider the following series object namely S:

0 0.430271 
1 0.617328 
2 0.265421 
3 0.836113 
dtype: float64

What will be returned by the following statements?

(a) S * 100

(b) S > 0

(c) S1 = pd.Series(S)

(d) S3 = pd.Series(S1) + 3

Answer

(a) S * 100

Output
0    43.0271
1    61.7328
2    26.5421
3    83.6113
dtype: float64

(b) S > 0

Output
0    True
1    True
2    True
3    True
dtype: bool

(c) S1 = pd.Series(S)

Output
0    0.430271
1    0.617328
2    0.265421
3    0.836113
dtype: float64

(d) S3 = pd.Series(S1) + 3

Output
0    3.430271
1    3.617328
2    3.265421
3    3.836113
dtype: float64

Question 13

What will be the output produced by the following code?

Stationery = ['pencils', 'notebooks', 'scales', 'erasers']
S = pd.Series([20, 33, 52, 10], index = Stationery)
S2 = pd.Series([17, 13, 31, 32], index = Stationery)
print(S == S2)
S = S + S2
print(S)

Answer

Output
pencils      False
notebooks    False
scales       False
erasers      False
dtype: bool
pencils      37
notebooks    46
scales       83
erasers      42
dtype: int64
Explanation

The code creates two pandas Series, S and S2, with the index Stationery which is a list. The code then compares S and S2 element-wise using the "==" operator, which returns a boolean Series indicating whether each pair of values is equal. Finally, the code adds S and S2 element-wise using the "+" operator, which returns a new Series with the summed values, and assigns the result back to S. The resulting S Series will have the same index as before i.e., Stationery.

Question 14

What will be the output produced by the following codes, considering the Series object S given in Q.13?

(a) print(S[1:4])

(b) print(S[:1])

(c) print(S[0:2])

(d) S[0:2] = 12
     print(S)

(e) print(S.index)

(f) print(S.values)

The Series object 'S' is as follows:

pencils      20
notebooks    33
scales       52
erasers      10
dtype: int64

Answer

(a) print(S[1:4])

Output
notebooks    33
scales       52
erasers      10
dtype: int64
Explanation

The slice S[1:4] starts at index 1 and ends at index 3, hence, it includes three elements i.e., elements from index 1 and 3.

(b) print(S[:1])

Output
pencils    20
dtype: int64
Explanation

The slice S[:1] starts at index 0 and ends at index 1, but because the end index is exclusive, it includes only one element i.e., the element at index 0.

(c) print(S[0:2])

Output
pencils      20
notebooks    33
dtype: int64
Explanation

The slice S[0:2] starts at index 0 and ends at index 1, hence, it includes two elements i.e., elements from index 0 and 1.

(d) S[0:2] = 12
     print(S)

Output
pencils      12
notebooks    12
scales       52
erasers      10
dtype: int64
Explanation

The slice S[0:2] = 12 assigns the value 12 to indices 0 and 1 in Series S, directly modifying those elements. The updated Series is then printed.

(e) print(S.index)

Output
Index(['pencils', 'notebooks', 'scales', 'erasers'], dtype='object')
Explanation

The code print(S.index) displays the indices of Series S.

(f) print(S.values)

Output
[12 12 52 10]
Explanation

The code print(S.values) displays the values of Series S.

Question 15

Write a program to iterate and print a dataframe column-wise and print only first three columns.

Solution
import pandas as pd

data = {
    'Name': ['Aliya', 'Hemanth', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['Bangalore', 'Chennai', 'Mumbai'],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)
first_three_columns = df.iloc[:, :3]

print("Each column:")
for column_name in first_three_columns:
    column_data = first_three_columns[column_name]
    print(column_name)
    print(column_data)
Output
Each column:
Name
0      Aliya
1    Hemanth
2    Charlie
Name: Name, dtype: object
Age
0    25
1    30
2    35
Name: Age, dtype: int64
City
0    Bangalore
1      Chennai
2       Mumbai
Name: City, dtype: object

Question 16

Write a program to iterate and print a dataframe row-wise at a time and print only first five rows.

Solution
import pandas as pd
data = {
    'Name': ['Amruta', 'Harsh', 'Yogesh', 'Shreya', 'Zoya', 'Nyra'],
    'Age': [25, 30, 35, 40, 45, 28],
    'City': ['Chandigarh', 'Jaipur', 'Dehradun', 'Delhi', 'Vadodara', 'Guwahati']
}

df = pd.DataFrame(data)
first_five_rows = df.head(5)
print("Each row:")
for index, row in first_five_rows.iterrows():
    print("Index:", index)
    print(row)
Output
Each row:
Index: 0
Name        Amruta
Age             25
City    Chandigarh
Name: 0, dtype: object
Index: 1
Name     Harsh
Age         30
City    Jaipur
Name: 1, dtype: object
Index: 2
Name      Yogesh
Age           35
City    Dehradun
Name: 2, dtype: object
Index: 3
Name    Shreya
Age         40
City     Delhi
Name: 3, dtype: object
Index: 4
Name        Zoya
Age           45
City    Vadodara
Name: 4, dtype: object

Question 17(a)

Find the error in the following code fragments:

S2 = pd.Series([101, 102, 1-2, 104])
print (S2.index)
S2.index = [0.1.2.3, 4, 5]
S2[5] = 220
print (S2)

Answer

S2 = pd.Series([101, 102, 1-2, 104]) 
print(S2.index)
S2.index = [0.1.2.3, 4, 5] #Error 1
S2[5] = 220
print(S2)

Error 1 — In the line S2.index = [0.1.2.3, 4, 5], the index values are not separated by commas. It should be S2.index = [0, 1, 2, 3, 4, 5]. Additionally, the Series S2 initially has four elements, so assigning a new index list of six elements ([0, 1, 2, 3, 4, 5]) to S2.index will raise a ValueError because the new index list length does not match the length of the Series.

The corrected code is:

S2 = pd.Series([101, 102, 1-2, 104])
print(S2.index)
S2.index = [0, 1, 2, 3]
S2[5] = 220
print(S2)

Question 17(b)

Find the error in the following code fragments:

S = pd.Series(2, 3, 4, 55, index = range (4))

Answer

In the above code fragment, the data values should be enclosed in square brackets [] to form a list.

The corrected code is:

S = pd.Series([2, 3, 4, 55], index = range(4))

Question 17(c)

Find the error in the following code fragments:

S1 = pd.Series(1, 2, 3, 4, index = range(7))

Answer

In the above code fragment, the data values should be enclosed in square brackets to form a list and the specified index range range(7) is out of range for the provided data [1, 2, 3, 4]. Since there are only four data values, the index should have a length that matches the number of data values.

The corrected code is:

S1 = pd.Series([1, 2, 3, 4], index = range(4))

Question 17(d)

Find the error in the following code fragments:

S2 = pd.Series([1, 2, 3, 4, 5], index = range(4))

Answer

The error in the code fragment is that the length of the data list [1, 2, 3, 4, 5] does not match the length of the index range(4). Since there are only five data values, the index should have a length that matches the number of data values.

The corrected code is:

S1 = pd.Series([1, 2, 3, 4, 5], index = range(5))

Question 18

Find the error:

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])
s = pd.Series (data, index=[100, 101, 102, 103, 104, 105])
print(s[102,103,104])

Answer

The error in the above code is in the line print(s[102, 103, 104]). When accessing elements in a pandas Series using square brackets, we should use a list of index values, not multiple separate index values separated by commas.

The corrected code is:

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])  
s = pd.Series(data, index = [100, 101, 102, 103, 104, 105])  
print(s[[102, 103, 104]]) 

Question 19

Why does the following code cause error?

s1 = pd.Series(range 1, 15, 5), index = list('ababa')
print(s1['ab'])

Answer

The statement s1['ab'] causes an error because 'ab' is not a single key in the index. The index has individual keys 'a' and 'b', but not 'ab'.

Question 20

Consider the following Class12.csv file containing the data as given below:

RollNoNameAccountsMathsBStIPEco
10Ritu Jain8867879756
11Mridul Mehta6778778790
12Divij8789788292
13Yashvi Verma6782.376.598.278.6
14Deepak Virmani56.776.5887867
15Jatin Malik76667787.567.5

(a) Read the csv file into a dataframe df which is stored with tab ('\t') separator.

(b) Write the code to find the total marks (Total_ marks) for each student and add it to the newly-created dataframe.

(c) Also calculate the percentage obtained by each student under a new column “Average” in the dataframe.

Answer

(a)

df = pd.read_csv('Class12.csv', sep='\t')
Output
   RollNo            Name  Accounts  Maths   BSt    IP   Eco
0      10       Ritu Jain      88.0   67.0  87.0  97.0  56.0
1      11    Mridul Mehta      67.0   78.0  77.0  87.0  90.0
2      12           Divij      87.0   89.0  78.0  82.0  92.0
3      13    Yashvi Verma      67.0   82.3  76.5  98.2  78.6
4      14  Deepak Virmani      56.7   76.5  88.0  78.0  67.0
5      15     Jatin Malik      76.0   66.0  77.0  87.5  67.5

(b)

df['Total_marks'] = df[['Accounts', 'Maths', 'BSt', 'IP', 'Eco']].sum(axis=1)
Output
   RollNo            Name  Accounts  Maths   BSt    IP   Eco  Total_marks
0      10       Ritu Jain      88.0   67.0  87.0  97.0  56.0        395.0
1      11    Mridul Mehta      67.0   78.0  77.0  87.0  90.0        399.0
2      12           Divij      87.0   89.0  78.0  82.0  92.0        428.0
3      13    Yashvi Verma      67.0   82.3  76.5  98.2  78.6        402.6
4      14  Deepak Virmani      56.7   76.5  88.0  78.0  67.0        366.2
5      15     Jatin Malik      76.0   66.0  77.0  87.5  67.5        374.0

(c)

df['Average'] = (df['Total_marks'] / 500) * 100
Output
   RollNo            Name  Accounts  Maths   BSt    IP   Eco  Total_marks  Average
0      10       Ritu Jain      88.0   67.0  87.0  97.0  56.0        395.0    79.00
1      11    Mridul Mehta      67.0   78.0  77.0  87.0  90.0        399.0    79.80
2      12           Divij      87.0   89.0  78.0  82.0  92.0        428.0    85.60
3      13    Yashvi Verma      67.0   82.3  76.5  98.2  78.6        402.6    80.52
4      14  Deepak Virmani      56.7   76.5  88.0  78.0  67.0        366.2    73.24
5      15     Jatin Malik      76.0   66.0  77.0  87.5  67.5        374.0    74.80

Question 21

Write a program that reads students marks from a ‘Result.csv’ file and displays percentage of each student.

Solution
import pandas as pd
df = pd.read_csv('Result.csv')
df['Total_marks'] = df[['Maths', 'Science', 'English',]].sum(axis=1)
df['Percentage'] = (df['Total_marks'] / 300) * 100
print("Percentage of each student: ")
print(df[['Name', 'Percentage']])
Output
Percentage of each student: 
    Name  Percentage
0  Rahul   85.000000
1  Rohan   81.666667
2   Riya   88.333333
3    Raj   86.666667

Question 22

Write the name of function to store data from a dataframe into a CSV file.

Answer

to_csv() function is used to store data from a DataFrame into a CSV file.

Question 23

How can we import specific columns from a CSV file?

Answer

To import specific columns from a CSV file, we can use the usecols parameter of the read_csv function from the pandas library. The usecols parameter is used to specify the list of columns to be read from the CSV file. The syntax is pd.read_csv('filename.csv', usecols = ['column1', 'column2',...]).

For example, the following command will access Name and Age columns of Employee file.

df = pd.read_csv("Employee.csv", usecols = ['Name', 'Age'])
print(df)

Question 24

What are the advantages of CSV file formats?

Answer

The advantages of CSV file formats are as follows:

  1. It is a simple, compact and ubiquitous format for data storage.

  2. It is a common format for data interchange.

  3. It can be opened in popular spreadsheet packages like MS-Excel, Calc etc.

  4. Nearly all spreadsheets and databases support import/export to csv format.

Question 25

What all libraries do you require in order to bring data from a CSV file into a dataframe?

Answer

Python's Pandas library is required to bring data from a CSV file into a DataFrame.

Question 26

You want to read data from a CSV file in a dataframe but you want to provide your own column names to the dataframe. What additional argument would you specify in read_csv()?

Answer

To read data from a CSV file into a DataFrame while providing our own column names, we can use the names argument in the read_csv() function. The syntax is : <DF> = pandas.read_csv(<filepath>, names = <sequence containing column names>).

Question 27

By default, read_csv() uses the value of first row as column headers in dataframes. Which argument will you give to ensure that the top/first row’s data is used as data and not as column headers?

Answer

To ensure that the top/first row's data is used as data and not as column headers in a DataFrame when using the read_csv() function, we need to use the header argument and set it to None. The syntax is : <DF> = pandas.read_csv(<filepath>, header = None).

Question 28

Which argument would you give to read.csv() if you only want to read top 10 rows of data?

Answer

The nrows argument can be used to read only the top 10 rows of data from a CSV file using the read_csv() function in pandas. The nrows argument specifies the number of rows of the file to read. The syntax is : df = pandas.read_csv(<filepath>, nrows = 10).

Question 29

Create the following dataframe by the name Project regarding a competition and answer the questions given below:

Enrolment No.NameClassSectionProject Name
101RekhaXIIBData Analysis
102DivyaXIICGraphical Analysis
103GeetXIIHMachine Learning
104JeetXIIBApp Development

(a) Insert two records with different methods.

(b) Insert a column to store grades given to their projects.

(c) Write a command to display the name and section for all.

(d) Write a command to display the records with index value 101 and 102.

(e) Insert a column after name to store the school name.

(f) Display the second and third record.

(g) Replace the name and section of Jeet to 'XI','A'.

(h) Remove the column Project Name and Section.

Answer

The DataFrame project is created as follows:

import pandas as pd
data = {'Name': ['Rekha', 'Divya', 'Geet', 'Jeet'],
    'Class': ['XII', 'XII', 'XII', 'XII'],
    'Section': ['B', 'C', 'H', 'B'],
    'Project Name': ['Data Analysis', 'Graphical Analysis', 'Machine Learning', 'App Development']
}
Project = pd.DataFrame(data, index = [101, 102, 103, 104])
print(Project)
Output
      Name Class Section        Project Name
101  Rekha   XII       B       Data Analysis
102  Divya   XII       C  Graphical Analysis
103   Geet   XII       H    Machine Learning
104   Jeet   XII       B     App Development

(a)

Project.loc[105] = [105, 'Arya', 'XI', 'D', 'Web Development']
Project.loc[105] = ['Arya', 'XI', 'D', 'Web Development']
Project.at[106, 'Name'] = 'Vikram'
Project.at[106, 'Class'] = 'XI'
Project.at[106, 'Section'] = 'A'
Project.at[106, 'Project Name'] = 'AI Research'
Output
       Name Class Section        Project Name
101   Rekha   XII       B       Data Analysis
102   Divya   XII       C  Graphical Analysis
103    Geet   XII       H    Machine Learning
104    Jeet   XII       B     App Development
105    Arya    XI       D     Web Development
106  Vikram    XI       A         AI Research

(b)

Project['Grade'] = ['A', 'B+', 'C+', 'B', 'A+', 'C']
Output
       Name Class Section        Project Name Grade
101   Rekha   XII       B       Data Analysis     A
102   Divya   XII       C  Graphical Analysis    B+
103    Geet   XII       H    Machine Learning    C+
104    Jeet   XII       B     App Development     B
105    Arya    XI       D     Web Development    A+
106  Vikram    XI       A         AI Research     C

(c)

print(Project[['Name', 'Section']])
Output
       Name Section
101   Rekha       B
102   Divya       C
103    Geet       H
104    Jeet       B
105    Arya       D
106  Vikram       A

(d)

print(Project.loc[[101, 102]])
Output
      Name Class Section        Project Name Grade
101  Rekha   XII       B       Data Analysis     A
102  Divya   XII       C  Graphical Analysis    B+

(e)

Project.insert(1, 'School', ['ABC', 'PQR', 'ABC', 'PQR', 'XYZ', 'XYZ'])
Output
       Name School Class Section        Project Name Grade
101   Rekha    ABC   XII       B       Data Analysis     A
102   Divya    PQR   XII       C  Graphical Analysis    B+
103    Geet    ABC   XII       H    Machine Learning    c+
104    Jeet    PQR   XII       B     App Development     B
105    Arya    XYZ    XI       D     Web Development    A+
106  Vikram    XYZ    XI       A         AI Research     C

(f)

print(Project.iloc[1:3])
Output
      Name School Class Section        Project Name Grade
102  Divya    PQR   XII       C  Graphical Analysis    B+
103   Geet    ABC   XII       H    Machine Learning    c+

(g)

Project.Class[104] = 'XI'
Project.Section[104] = 'A'
Output
       Name School Class Section        Project Name Grade
101   Rekha    ABC   XII       B       Data Analysis     A
102   Divya    PQR   XII       C  Graphical Analysis    B+
103    Geet    ABC   XII       H    Machine Learning    c+
104    Jeet    PQR    XI       A     App Development     B
105    Arya    XYZ    XI       D     Web Development    A+
106  Vikram    XYZ    XI       A         AI Research     C

(h)

Project = Project.drop(['Project Name', 'Section'], axis = 1)
Output
       Name School Class Grade
101   Rekha    ABC   XII     A
102   Divya    PQR   XII    B+
103    Geet    ABC   XII    c+
104    Jeet    PQR    XI     B
105    Arya    XYZ    XI    A+
106  Vikram    XYZ    XI     C

Question 30

Consider the following dataframe: CORONA and answer the questions given below:

IDStateCases
100Delhi3000
110Mumbai4000
120Chennai5000
130Surat4500

Create the above-given dictionary with the given indexes.

(a) Write code to add a new column “Recovery” using the series method to store the number of patients recovered in every state.

(b) To add a new column “Deaths” using the assign() method to store the number of deaths in every state.

(c) To add a new row to store details of another state using loc (assume values).

(d) To add a new column "Percentage" using the insert() method to store the percentage of recovery in every state (assume values). The column should be added as the fourth column in the dataframe.

(e) To delete the column “Percentage” using del command.

(f) To delete the column “Deaths” using pop() method.

(g) To insert a new row of values using iloc[] at the 1st position.

(h) To delete Cases and State temporarily from the dataframe.

Answer

The DataFrame CORONA is created as :

import pandas as pd
data = {'State': ['Delhi', 'Mumbai', 'Chennai', 'Surat'], 
        'Cases': [3000, 4000, 5000, 4500]}
CORONA = pd.DataFrame(data, index=[100, 110, 120, 130])
print(CORONA)
Output
       State  Cases
100    Delhi   3000
110   Mumbai   4000
120  Chennai   5000
130    Surat   4500

(a)

CORONA['Recovery'] = pd.Series([2500, 3000, 3500, 3200], index=[100, 110, 120, 130])
Output
       State  Cases  Recovery
100    Delhi   3000      2500
110   Mumbai   4000      3000
120  Chennai   5000      3500
130    Surat   4500      3200

(b)

CORONA = CORONA.assign(Deaths=[200, 250, 300, 220])
Output
      State  Cases  Recovery  Deaths
100    Delhi   3000      2500     200
110   Mumbai   4000      3000     250
120  Chennai   5000      3500     300
130    Surat   4500      3200     220

(c)

CORONA.loc[140] = ['Karnataka', 4200, 2800, 180]
Output
         State  Cases  Recovery  Deaths
100      Delhi   3000      2500     200
110     Mumbai   4000      3000     250
120    Chennai   5000      3500     300
130      Surat   4500      3200     220
140  Karnataka   4200      2800     180

(d)

CORONA.insert(3, 'Percentage', [80, 75, 70, 71, 67])
Output
         State  Cases  Recovery  Percentage  Deaths
100      Delhi   3000      2500          80     200
110     Mumbai   4000      3000          75     250
120    Chennai   5000      3500          70     300
130      Surat   4500      3200          71     220
140  Karnataka   4200      2800          67     180

(e)

del CORONA['Percentage']
Output
         State  Cases  Recovery  Deaths
100      Delhi   3000      2500     200
110     Mumbai   4000      3000     250
120    Chennai   5000      3500     300
130      Surat   4500      3200     220
140  Karnataka   4200      2800     180

(f)

CORONA.pop('Deaths')
Output
        State  Cases  Recovery
100      Delhi   3000      2500
110     Mumbai   4000      3000
120    Chennai   5000      3500
130      Surat   4500      3200
140  Karnataka   4200      2800

(g) The iloc method is not used to add rows to a DataFrame. It is used for index-based or integer-location-based accessing of rows and columns, not for adding rows.

One way of adding a row at a specific position in a DataFrame is by creating a new DataFrame including the row and then concatenating the two DataFrames as shown below :

new_row = {'State': 'Hyderabad', 'Cases': 5200, 'Recovery': 3800}
new_df = pd.DataFrame([new_row], index= [150])
CORONA = pd.concat([new_df, CORONA])
Output
         State  Cases  Recovery
150  Hyderabad   5200      3800
100      Delhi   3000      2500
110     Mumbai   4000      3000
120    Chennai   5000      3500
130      Surat   4500      3200
140  Karnataka   4200      2800

(h)

CORONA.drop(['Cases', 'State'], axis=1, inplace=True)
Output
    Recovery
100      3800
110      2500
120      3000
130      3500
140      3200
5        2800

Question 31

Create a dataframe ‘Student’ from two series—Name and Grade, Name and Marks of five students.

(a) Display the first three records from student dataframe.

(b) Display the last two records from student dataframe.

Answer

The Student DataFrame is created as :

import pandas as pd

Name = ['John', 'Anna', 'Peter', 'Linda', 'Bob']
Grade = ['A', 'B', 'A+', 'C', 'B+']
Marks = [90, 80, 95, 70, 85]
S1 = pd.Series(Grade, index = Name) 
S2 = pd.Series(Marks, index = Name) 

Student = pd.DataFrame({'Grade' : S1, 'Marks' : S2 })
print(Student)
Output
      Grade  Marks
John      A     90
Anna      B     80
Peter    A+     95
Linda     C     70
Bob      B+     85

(a)

print(Student.head(3))
Output
      Grade  Marks
John      A     90
Anna      B     80
Peter    A+     95

(b)

print(Student.tail(2))
Output
      Grade  Marks
Linda     C     70
Bob      B+     85

Question 32

Create a dataframe of dictionary consisting of Name, Sub1, Sub2, Sub3, Sub4, Sub5 of five students.

(a) Display the dataframe.

(b) Display the first 5 rows and bottom 3 rows of student dataframe.

Answer

(a)

import pandas as pd

data = {
    'Name': ['Joseph', 'Ananya', 'Praneet', 'Lakshmi', 'Bhagya'],
    'Sub1': [90, 80, 95, 70, 85],
    'Sub2': [85, 90, 88, 92, 89],
    'Sub3': [88, 85, 90, 95, 92],
    'Sub4': [92, 88, 85, 90, 95],
    'Sub5': [95, 92, 92, 88, 90]
}

Student = pd.DataFrame(data)
print(Student)
Output
      Name  Sub1  Sub2  Sub3  Sub4  Sub5
0   Joseph    90    85    88    92    95
1   Ananya    80    90    85    88    92
2  Praneet    95    88    90    85    92
3  Lakshmi    70    92    95    90    88
4   Bhagya    85    89    92    95    90

(b)

print("First 5 rows:")
print(Student.head(5))

print("\nBottom 3 rows:")
print(Student.tail(3))
Output
First 5 rows:
      Name  Sub1  Sub2  Sub3  Sub4  Sub5
0   Joseph    90    85    88    92    95
1   Ananya    80    90    85    88    92
2  Praneet    95    88    90    85    92
3  Lakshmi    70    92    95    90    88
4   Bhagya    85    89    92    95    90

Bottom 3 rows:
      Name  Sub1  Sub2  Sub3  Sub4  Sub5
2  Praneet    95    88    90    85    92
3  Lakshmi    70    92    95    90    88
4   Bhagya    85    89    92    95    90

Question 33

Create two dataframes of salary of five employees and do the following:

(a) Display both the dataframes.

(b) Add 5000 as bonus in both dataframes and display them.

Answer

(a)

import pandas as pd
data1 = {'Employee': ['Jatin', 'Avinash', 'Kavya', 'Apoorva', 'Nitin'],
         'Salary': [50000, 60000, 70000, 80000, 90000]}
df1 = pd.DataFrame(data1)

data2 = {'Employee': ['Saanvi', 'Aditi', 'Shashank', 'Swapnil', 'Shravani'],
         'Salary': [55000, 45000, 30000, 85000, 66000]}
df2 = pd.DataFrame(data2)

print("DataFrame 1:")
print(df1)
print("\nDataFrame 2:")
print(df2)
Output
DataFrame 1:
  Employee  Salary
0    Jatin   50000
1  Avinash   60000
2    Kavya   70000
3  Apoorva   80000
4    Nitin   90000

DataFrame 2:
   Employee  Salary
0    Saanvi   55000
1     Aditi   45000
2  Shashank   30000
3   Swapnil   85000
4  Shravani   66000

(b)

df1['Bonus'] = 5000
df2['Bonus'] = 5000
print("DataFrame 1 with bonus:")
print(df1)
print("\nDataFrame 2 with bonus:")
print(df2)
Output
DataFrame 1 with bonus:
  Employee  Salary  Bonus
0    Jatin   50000   5000
1  Avinash   60000   5000
2    Kavya   70000   5000
3  Apoorva   80000   5000
4    Nitin   90000   5000

DataFrame 2 with bonus:
   Employee  Salary  Bonus
0    Saanvi   55000   5000
1     Aditi   45000   5000
2  Shashank   30000   5000
3   Swapnil   85000   5000
4  Shravani   66000   5000

Question 34

Create a dataframe using list [10, 11, 12, 13, 14] [23, 34, 45, 32, 65] [55, 60, 65, 70, 75] and do the following:

(a) Display the dataframe.

(b) Add the list [1, 2, 3, 4, 5] to dataframe and display it.

Answer

(a)

import pandas as pd

data = [[10, 11, 12, 13, 14],
        [23, 34, 45, 32, 65],
        [55, 60, 65, 70, 75]]
df = pd.DataFrame(data)

print(df)
Output
    0   1   2   3   4
0  10  11  12  13  14
1  23  34  45  32  65
2  55  60  65  70  75

(b)

df.loc[3] = [12, 62, 53, 34, 75]
print(df)
Output
    0   1   2   3   4
0  10  11  12  13  14
1  23  34  45  32  65
2  55  60  65  70  75
3  12  62  53  34  75

Question 35

Create a dataframe of [23, 25], [34], [43, 44, 45, 46] and do the following: :

(a) Display the dataframe. Notice that the missing value is represented by NaN.

(b) Replace the missing value with 0.

(c) Replace the missing value with -1, -2, -3, -4 for columns 0, 1, 2, 3.

(d) Replace the missing value by copying the value from the above cell.

Answer

(a)

import pandas as pd

data = [[23, 25], 
        [34], 
        [43, 44, 45, 46]]
df = pd.DataFrame(data)
print(df)
Output
    0     1     2     3
0  23  25.0   NaN   NaN
1  34   NaN   NaN   NaN
2  43  44.0  45.0  46.0

(b)

df = df.fillna(0)
Output
    0     1     2     3
0  23  25.0   0.0   0.0
1  34   0.0   0.0   0.0
2  43  44.0  45.0  46.0

(c)

df = df.fillna({0:-1, 1:-2, 2:-3, 3:-4})
Output
    0     1     2     3
0  23  25.0  -3.0  -4.0
1  34  -2.0  -3.0  -4.0
2  43  44.0  45.0  46.0

(d)

df= df.fillna(method='ffill')
Output
    0     1     2     3
0  23  25.0   NaN   NaN
1  34  25.0   NaN   NaN
2  43  44.0  45.0  46.0

Question 36

Create a dataframe of D1 and D2;

D1 = {‘Rollno’ : [1001, 1004, 1005, 1008, 1009], ‘Name’: [‘Sarika’, ‘Abhay’, ‘Mohit’, ‘Ruby’, ‘Govind’ ]}
D2 = {‘Rollno’ : [1002, 1003, 1004, 1005, 1006], ‘Name’:[‘Seema’,‘Jia’,‘Shweta’, ‘Sonia’, ‘Nishant’]}

(a) Concatenate row-wise.

(b) Concatenate column-wise.

Answer

The DataFrame D1 and D2 are created as :

import pandas as pd

Data1 = {'Rollno': [1001, 1004, 1005, 1008, 1009], 'Name': ['Sarika', 'Abhay', 'Mahit', 'Ruby', 'Govind']}
Data2 = {'Rollno': [1002, 1003, 1004, 1005, 1006], 'Name': ['Seema', 'Jia', 'Shweta', 'Sonia', 'Nishant']}

D1 = pd.DataFrame(Data1)
D2 = pd.DataFrame(Data2)
print(D1)
print(D2)
Output
   Rollno    Name
0    1001  Sarika
1    1004   Abhay
2    1005   Mahit
3    1008    Ruby
4    1009  Govind
   Rollno     Name
0    1002    Seema
1    1003      Jia
2    1004   Shweta
3    1005    Sonia
4    1006  Nishant

(a)

df = pd.concat([D1, D2])
Output
   Rollno     Name
0    1001   Sarika
1    1004    Abhay
2    1005    Mahit
3    1008     Ruby
4    1009   Govind
0    1002    Seema
1    1003      Jia
2    1004   Shweta
3    1005    Sonia
4    1006  Nishant

(b)

df = pd.concat([D1, D2], axis=1)
Output
   Rollno    Name  Rollno     Name
0    1001  Sarika    1002    Seema
1    1004   Abhay    1003      Jia
2    1005   Mahit    1004   Shweta
3    1008    Ruby    1005    Sonia
4    1009  Govind    1006  Nishant

Question 37

Create a dataframe of {‘A’ : [ ]} and display whether it is empty or not.

Solution
import pandas as pd
df = pd.DataFrame({'A': []})
print(df)

if df.empty:
    print("The dataframe is empty.")
else:
    print("The dataframe is not empty.")
Output
Empty DataFrame
Columns: [A]
Index: []
The dataframe is empty.

Question 38

Create a dataframe of {‘A’ : [5, 6], ‘B’: [3, 0], 'C': [0, 0]} and display the result of all() and any() functions.

Solution
import pandas as pd
df = pd.DataFrame({'A': [5, 6], 'B': [3, 0], 'C': [0, 0]})
print(df)

print("\nResult of all() function:")
print(df.all())

print("\nResult of any() function:")
print(df.any())
Output
   A  B  C
0  5  3  0
1  6  0  0

Result of all() function:
A     True
B    False
C    False
dtype: bool

Result of any() function:
A     True
B     True
C    False
dtype: bool

Question 39

Create a dataframe of {‘A’ : [True, True], ‘B’: [True, False], ‘C’: [False, False]} and display the result of all() and any().

Solution
import pandas as pd
df = pd.DataFrame({'A': [True, True], 'B': [True, False], 'C': [False, False]})

print(df)

print("\nResult of all() function:")
print(df.all())

print("\nResult of any() function:")
print(df.any())
Output
      A      B      C
0  True   True  False
1  True  False  False

Result of all() function:
A     True
B    False
C    False
dtype: bool

Result of any() function:
A     True
B     True
C    False
dtype: bool

Question 40

What is the use of statement: inplace=True?

Answer

When inplace=True is set, the DataFrame is modified directly, and the changes are applied to the existing DataFrame. This means that the original DataFrame is altered, and no new DataFrame is created.

Question 41

Differentiate between del, pop() and drop() functions.

Answer

del functionpop() functiondrop() function
The del statement deletes a column from a DataFrame.The pop() function is used to delete a column or an item from a DataFrame or Series.The drop() function is used to drop rows or columns from a DataFrame.
It does not return a value.It returns the removed item.It returns a new DataFrame with the dropped labels.
It modifies the original DataFrame or Series.It modifies the original DataFrame or Series.It does not modifies the original DataFrame by default (unless inplace = True is specified).
The syntax is del df['column_name'].The syntax is df['column_name'].pop().The syntax is df.drop(index or sequence of indexes).