Python Pandas — I

PrevNextBack

Python Pandas — I

Checkpoint 1.1

Question 1

What is the use of Python Pandas library ?

Answer

Pandas is the most popular library for data analysis. It offers data I/O, computations across rows/columns, subset selection, dataset merging, handling missing data, group-wise operations, data reshaping, time-series analysis, and integrates with visualization tools.

Question 2

Name the Pandas object that can store one dimensional array like object and can have numeric or labelled indexes.

Answer

The Pandas object that can store one-dimensional array like objects with numeric or labeled indexes is called a "Series Object".

Question 3

Can you have duplicate indexes in a series object ?

Answer

Yes, series object can have duplicate indexes.

Question 4

What do these attributes of series signify ?

(i) size

(ii) itemsize

(iii) nbytes

Answer

(i) size — It returns the number of elements in the underlying data.

(ii) itemsize — It returns the size of the dtype of the item of the underlying data.

(iii) nbytes — It returns the number of bytes in the underlying data.

Question 5

If S1 is a series object then how will len(S1) and S1.count() behave ?

Answer

len(S1) returns the total number of elements in the Series S1 object, including NaNs, while S1.count() returns the count of non-NaN values in the S1 Series object.

Question 6

What are NaNs ? How do you store them in a data structures ?

Answer

NaN stands for 'Not a Number'. In Python libraries like NumPy and Pandas, NaN is the legal empty value used to represent missing or undefined values, and we can use np.NaN (imported NumPy as np) to specify a missing value.

Question 7

True/False. Series objects always have indexes 0 to n -1.

Answer

False

Reason — Series objects allow labeled indexes and can take any user-defined labels. Hence, their indexes are not necessarily from 0 to n-1.

Question 8

What is the use of del statement ?

Answer

The del statement in pandas is used to delete a column from a dataframe.

Question 9

What does drop() function do ?

Answer

The drop() function in pandas is used to delete rows from a dataframe.

Question 10

What is the role of inplace argument in rename() function.

Answer

The inplace argument in the rename() function in pandas specifies whether to modify the dataframe in place or return a new dataframe with the changes. When inplace = True is set, the dataframe is modified directly, and the changes are applied to the existing dataframe. If inplace = False or not specified (default), a new dataframe with the changes is returned, leaving the original dataframe unchanged.

Multiple Choice Questions

Question 1

Which of the following statement will import pandas library ?

  1. Import pandas as pd
  2. import Pandas as py
  3. import pandas as pd
  4. import panda as pd

Answer

import pandas as pd

Reason — The syntax to import a library with an alias is import library_name as alias. Therefore, the statement import pandas as pd is used to import the pandas library in Python with the alias 'pd'.

Question 2

To create an empty Series object, you can use :

  1. pd.Series(empty)
  2. pd.Series(np.NaN)
  3. pd.Series( )
  4. all of these

Answer

pd.Series()

Reason — To create an empty Series object i.e., having no values, we can just use the Series() as: <Series Object> = pandas.Series().

Question 3

To specify datatype int16 for a Series object, you can write :

  1. pd.Series(data = array, dtype = int16)
  2. pd.Series(data = array, dtype = numpy.int16)
  3. pd.Series(data = array.dtype = pandas.int16)
  4. all of the above

Answer

pd.Series(data = array, dtype = numpy.int16)

Reason — The syntax to specify data type for a Series object is : <Series Object> = pandas.Series(data = None, index = None, dtype = None). Therefore, according to this syntax, pd.Series(data = array, dtype = numpy.int16) is correct.

Question 4

To get the number of dimensions of a Series object, ............... attribute is displayed.

  1. index
  2. size
  3. itemsize
  4. ndim

Answer

ndim

Reason — The ndim attribute is used to get the number of dimensions (axis) of a Series object. The syntax is <Series object>.ndim.

Question 5

To get the size of the datatype of the items in Series object, you can display ............... attribute.

  1. index
  2. size
  3. itemsize
  4. ndim

Answer

itemsize

Reason — The itemsize attribute is used to know the number of bytes allocated to each data item in Series object. The syntax is <Series object>.itemsize.

Question 6

To get the number of elements in a Series object, ............... attribute may be used.

  1. index
  2. size
  3. itemsize
  4. ndim

Answer

size

Reason — The size attribute is used to know the number of elements in the Series object. The syntax is <Series object>.size.

Question 7

To get the number of bytes of the Series data, ............... attribute is displayed.

  1. hasnans
  2. nbytes
  3. ndim
  4. dtype

Answer

nbytes

Reason — The nbytes attribute is used to know total number of bytes taken by Series object data. The syntax is <Series object>.nbytes.

Question 8

To check if the Series object contains NaN values, ............... attribute is displayed.

  1. hasnans
  2. nbytes
  3. ndim
  4. dtype

Answer

hasnans

Reason — The hasnans attribute is used to check if a Series object contains some NaN value or not. The syntax is <Series object>.hasnans.

Question 9

To display third element of a Series object S, you will write ............... .

  1. S[:3]
  2. S[2]
  3. S[3]
  4. S[:2]

Answer

S[2]

Reason — The syntax to access individual elements of a Series object is <Series Object name>[<valid index>]. Therefore, according to this syntax, to display third element of a Series object S with zero based indexing, S[2] is correct.

Question 10

To display first three elements of a Series object S, you may write ............... .

  1. S[:3]
  2. S[3]
  3. S[3rd]
  4. all of these

Answer

S[:3]

Reason — The syntax to extract slices from Series object is <Series Object>[start:end:step]. Therefore, according to this syntax, the correct slice notation to display the first three elements of a Series object S is S[:3].

Question 11

To display last five rows of a Series object S, you may write ............... .

  1. head()
  2. head(5)
  3. tail()
  4. tail(5)

Answer

tail(), tail(5)

Reason — The syntax to display the last n rows of a Series object is <Series Object>.tail([n]). Therefore, according to this syntax, tail(5) will display last five rows of a Series object S. If n value is not specified, then tail() will return the last 5 rows of a Series object.

Question 12

Which of the following statement is wrong ?

  1. We can't change the index of the series
  2. We can easily convert the list, tuple, and dictionary into a series
  3. A series represents a single column in memory
  4. We can create empty series.

Answer

We can't change the index of the series

Reason — We can change or rename the indexes of a Series object by assigning a new index array to its index attribute. The syntax is <Object>.index = <new index array>.

Question 13

What type of error is returned by the following statement ?

import pandas as pa
pa.Series([1, 2, 3, 4], index = ['a', 'b', 'c'])
  1. Value Error
  2. Syntax Error
  3. Name Error
  4. Logical Error

Answer

Value Error

Reason — When specifying indexes explicitly using an index sequence, we must provide indexes equal to the number of values in the data array. Providing fewer or more indices will lead to an error, i.e., a ValueError.

Question 14

What will be the output of the following code ?

import pandas as pd
myser = pd.Series([0, 0, 0])
print(myser)
  1. 0 0
    0 0
    0 0
  2. 0 1
    0 1
    0 2
  3. 0 0
    1 0
    2 0
  4. 0 0
    1 1
    2 2

Answer

0 0
1 0
2 0

Reason — The code creates a pandas Series object myser with three elements [0, 0, 0], and when we print the Series, it displays the index along with the corresponding values. Since the Series is created with default indexes (0, 1, 2), the output shows the index values (0, 1, 2) along with the corresponding values (0, 0, 0).

Question 15

To display last five rows of a series object 'S', you may write :

  1. S.Head()
  2. S.Tail(5)
  3. S.Head(5)
  4. S.tail()

Answer

S.tail()

Reason — The syntax to display the last n rows of a Series object is <Series Object>.tail([n]). Therefore, according to this syntax, S.tail() will display last five rows of a Series object S.

Question 16

Missing data in Pandas object is represented through :

  1. Null
  2. None
  3. Missing
  4. NaN

Answer

NaN

Reason — NaN stands for 'Not a Number'. In Python libraries like NumPy and Pandas, NaN is the legal empty value used to represent missing or undefined values, and we can use np.NaN to specify a missing value.

Question 17

In Python Pandas, while performing mathematical operations on series, index matching is implemented and all missing values are filled in with ............... by default.

  1. Null
  2. Blank
  3. NaN
  4. Zero

Answer

NaN

Reason — When performing mathematical operations on pandas Series objects, index matching is implemented (this is called data alignment in Pandas objects), and missing values are filled with NaN (Not a Number) by default.

Question 18

Given a Pandas series called Sequences, the command which will display the first 4 rows is ............... .

  1. print(Sequences.head(4))
  2. print(Sequences.Head(4))
  3. print(Sequences.heads(4)
  4. print(Sequences.Heads(4))

Answer

print(Sequences.head(4))

Reason — The syntax to display the first n rows from a Series object is <Series object>.head([n]). Therefore, according to this syntax, the command to display the first 4 rows of Sequences is print(Sequences.head(4)).

Question 19

Which of the following statement is wrong in context of DataFrame ?

  1. Two dimensional size is Mutable
  2. Can Perform Arithmetic operations on rows and columns.
  3. Homogeneous tabular data structure
  4. Create DataFrame from numpy ndarray

Answer

Homogeneous tabular data structure

Reason — The pandas DataFrames can hold heterogeneous data, meaning each column can have a different data type.

Question 20

Which of the following is a two-dimensional labelled data structure of Python ?

  1. Relation
  2. DataFrame
  3. Series
  4. Square

Answer

DataFrame

Reason — A DataFrame is a two-dimensional labelled array like Pandas data structure that stores an ordered collection columns that can store data of different types.

Question 21

When we create a DataFrame from a list of Dictionaries the columns labels are formed by the :

  1. Union of the keys of the dictionaries
  2. Intersection of the keys of the dictionaries
  3. Union of the values of the dictionaries
  4. Intersection of the values of the dictionaries

Answer

Union of the keys of the dictionaries

Reason — When we create a DataFrame from a list of dictionaries, the column labels are formed by the union of the keys of the dictionaries.

Question 22

If a DataFrame is created using a 2D dictionary, then the indexes/row labels are formed from ............... .

  1. dictionary's values
  2. inner dictionary's keys
  3. outer dictionary's keys
  4. none of these

Answer

inner dictionary's keys

Reason — When a DataFrame is created using a 2D dictionary, then the indexes/row labels are formed from keys of inner dictionaries.

Question 23

If a dataframe is created using a 2D dictionary, then the column labels are formed from ............... .

  1. dictionary's values
  2. inner dictionary's keys
  3. outer dictionary's keys
  4. none of these

Answer

outer dictionary's keys

Reason — When a DataFrame is created using a 2D dictionary, then the column labels are formed from keys of outer dictionaries.

Question 24

Which of the following can be used to specify the data while creating a DataFrame ?

  1. Series
  2. List of Dictionaries
  3. Structured ndarray
  4. All of these

Answer

All of these

Reason — We can create a DataFrame object by passing data in many different ways, such as two-dimensional dictionaries (i.e., dictionaries having lists or dictionaries or ndarrays or series objects etc), two-dimensional ndarrays, series type object and another DataFrame object.

Question 25

The axis 0 identifies a DataFrame's ............... .

  1. rows
  2. columns
  3. values
  4. datatype

Answer

rows

Reason — The axis 0 identifies a DataFrame's row index.

Question 26

The axis 1 identifies a DataFrame's ............... .

  1. rows
  2. columns
  3. values
  4. datatype

Answer

columns

Reason — The axis 1 identifies a DataFrame's column index.

Question 27

To get the number of elements in a dataframe, ............... attribute may be used.

  1. size
  2. shape
  3. values
  4. ndim

Answer

size

Reason — The size attribute will return an integer representing the number of elements in a DataFrame object.

Question 28

To get NumPy representation of a dataframe, ............... attribute may be used.

  1. size
  2. shape
  3. values
  4. ndim

Answer

values

Reason — The values attribute will return a NumPy representation of the DataFrame.

Question 29

To get a number representing number of axes in a dataframe, ............... attribute may be used.

  1. size
  2. shape
  3. values
  4. ndim

Answer

ndim

Reason — The ndim attribute will return an integer representing the number of axes/array dimensions.

Question 30

Which attribute is not used with DataFrame ?

  1. Size
  2. Type
  3. Empty
  4. Columns

Answer

Type

Reason — The type attribute is not used with DataFrame.

Question 31

To get the transpose of a dataframe D1, you can write ............... .

  1. D1.T
  2. D1.Transpose
  3. D1.Swap
  4. All of these

Answer

D1.T

Reason — We can transpose a DataFrame by swapping its indexes and columns using the attribute T, with the syntax DataFrame.T. Therefore, D1.T is used to get the transpose of a DataFrame D1.

Question 32

To extract row/column from a dataframe, ............ function may be used.

  1. row()
  2. column()
  3. loc()
  4. All of these

Answer

loc()

Reason — The loc() function is used to access row(s) and/or a combination of rows and columns from a DataFrame object.

Question 33

To display the 3rd, 4th and 5th columns from the 6th to 9th rows of a dataframe DF, you can write ............... .

  1. DF.loc[6:9, 3:5]
  2. DF.loc[6:10, 3:6]
  3. DF.iloc[6:10, 3:6]
  4. DF.iloc[6:9, 3:5]

Answer

DF.iloc[6:10, 3:6]

Reason — To display subset from dataframe using row and column numeric index/position, iloc is used with syntax <DF object>.iloc[<start row index>:<end row index>, <start col index>:<end col index>]. Therefore, according to this syntax, DF.iloc[6:10, 3:6] is correct slice notation to display the 3rd, 4th and 5th columns from the 6th to 9th rows of a dataframe DF.

Question 34

To change the 5th column's value at 3rd row as 35 in dataframe DF, you can write ............... .

  1. DF[4, 6] = 35
  2. DF[3, 5] = 35
  3. DF.iat[4, 6] = 35
  4. DF.iat[3, 5] = 35

Answer

DF.iat[3, 5] = 35

Reason — The syntax to modify values using row and column position is <DataFrame>.iat[<row position>, <column position>]. Therefore, according to this syntax, DF.iat[3, 5] = 35 is used to change the 5th column's value at 3rd row as 35 in dataframe DF.

Question 35

Which among the following options can be used to create a DataFrame in Pandas ?

  1. A scalar value
  2. An ndarray
  3. A python dict
  4. All of these

Answer

All of these

Reason — We can create a DataFrame object in Pandas by passing data in many different ways, such as a scalar value, an ndarray and a Python dictionary.

Question 36

Identify the correct statement :

  1. The standard marker for missing data in Pandas is NaN.
  2. Series act in a way similar to that of an array
  3. Both (a) and (b)
  4. None of the above

Answer

Both (a) and (b)

Reason — NaN stands for "Not a Number" and is used in Pandas to represent missing or undefined values in a Series or DataFrame. A Series in Pandas is similar to a one-dimensional array or list in Python. It has an index and a corresponding array of data values. Series can be accessed, sliced, and manipulated in ways similar to arrays.

Question 37

Identify the correct option to select first four rows and second to fourth columns from a DataFrame 'Data':

  1. display(Data.iloc[1 : 4, 2 : 4])
  2. display(Data.iloc[1 : 5, 2 ; 5])
  3. print(Data.iloc[0 : 4, 1 : 4])
  4. print(Data.iloc[1 : 4, 2 : 4])

Answer

print(Data.iloc[0 : 4, 1 : 4])

Reason — To display subset from dataframe using row and column numeric index/position, iloc is used with syntax <DF object>.iloc[<start row index>:<end row index>, <start col index>:<end col index>]. Therefore, according to this syntax, print(Data.iloc[0 : 4, 1 : 4]) is correct statement to display first four rows and second to fourth columns from a DataFrame Data.

Question 38

To delete a column from a DataFrame, you may use ............... statement.

  1. remove
  2. del
  3. drop
  4. cancel

Answer

del

Reason — The del statement is used to delete a column from a DataFrame.

Question 39

To delete a row from a DataFrame, you may use ............... statement.

  1. remove
  2. del
  3. drop
  4. cancel

Answer

drop

Reason — The drop statement is used to delete a row from a DataFrame.

Question 40

Sudhanshu has written the following code to create a DataFrame with boolean index :

import numpy as np
import pandas as pd
    df = pd.DataFrame(data = [[5, 6, 7]], index = [true, false, true]) 
    print(df)

While executing the code, she is getting an error, help her to rectify the code :

  1. df = pd.DataFrame([True, False, True], data = [5, 6, 7])
  2. df = pd.DataFrame(data = [5, 6, 7], index = [True, False, True])
  3. df = pd.DataFrame([true, false, true], data = [5, 6, 7])
  4. df = pd.DataFrame(index = [true, false, true], data = [[5, 6, 7]])

Answer

df = pd.DataFrame(data = [5, 6, 7], index = [True, False, True])

Reason — The index values 'true' and 'false' should have the first letter capitalized to match Python's boolean values. Also, the 'data' parameter should contain the list of values to be included in the DataFrame. Hence, df = pd.DataFrame(data = [5, 6, 7], index = [True, False, True]) is correct.

Fill in the Blanks

Question 1

Pandas is a popular data-science library of Python.

Question 2

A series is a Pandas data structure that represents a 1 D array like object.

Question 3

A DataFrame is a Pandas data structure that represents a 2 D array like object.

Question 4

You can use numpy.NaN for missing data.

Question 5

To specify datatype for a Series object, dtype argument is used.

Question 6

The len() function on Series object returns total elements in it including NaNs.

Question 7

The count() function on Series object returns only the count of non-NaN values in it.

Question 8

Series is value mutable.

Question 9

Series is not size mutable.

Question 10

DataFrame is size mutable as well as value mutable.

Question 11

In a DataFrame, Axis = 1 represents the column elements.

Question 12

To access values using row labels you can use DF.loc .

Question 13

To access individual value, you can use DF.at using row/column index labels.

Question 14

To access individual value, you can use DF.iat using row/column integer position.

Question 15

The rename() function requires inplace argument to make changes in the original dataframe.

True/False Questions

Question 1

A Pandas Series object can be thought of as a column or a row, essentially.

Answer

True

Reason — When a Series object is used as a column in a DataFrame, it behaves like a column. However, when a Series is created with an index that matches the index of an existing DataFrame, it can behave like a row.

Question 2

Both Series and DataFrame are one-dimensional data structure objects.

Answer

False

Reason — The Series is a one-dimensional data structure object, while the DataFrame is a two-dimensional data structure object.

Question 3

While series is a one-dimensional data structure object, dataframe is a multi-dimensional data structure object.

Answer

True

Reason — The Series is a one-dimensional data structure object, while the DataFrame is a multi-dimensional data structure object.

Question 4

A Series object is value mutable.

Answer

True

Reason — The values within a Series object can be modified after the Series is created, hence they are referred to as value mutable.

Question 5

A Series object is size mutable.

Answer

False

Reason — The size of a Series object, once created, cannot change. If we want to add/drop an element, internally a new Series object will be created. Therefore, they are considered size immutable.

Question 6

A DataFrame object is value mutable.

Answer

True

Reason — The values within a DataFrame object can be modified after it is created, hence they are referred to as value mutable.

Question 7

A DataFrame object is size mutable.

Answer

True

Reason — The size of a DataFrame object can change in place after it is created. This means we can add or drop elements in an existing DataFrame object. Therefore, they are considered size mutable.

Question 8

There is no difference between a NumPy array and a Series object.

Answer

False

Reason — NumPy arrays can perform vectorized operations on two arrays only if their shapes match, while Series objects can perform vectorized operations on two Series objects even if their shapes differ, using NaN for non-matching indexes. Additionally, Series objects consume more memory compared to NumPy arrays. Hence, NumPy array and Series object are different.

Question 9

A DataFrame can be thought of as a group of multiple Series objects.

Answer

True

Reason — A DataFrame can be thought of as a collection of multiple Series objects. This is because a DataFrame can be created using multiple Series objects. For example, in a 2D dictionary, the values can be represented as Series objects, and by passing this dictionary as an argument, a DataFrame object can be created.

Question 10

A DataFrame has similar properties as a Series object.

Answer

False

Reason — A DataFrame is a 2-dimensional, heterogeneous, size-mutable data structure with 2 indexes, while a Series is a 1-dimensional, homogeneous, size-immutable data structure with 1 index.

Question 11

A Series object can store only homogeneous (same type of) elements.

Answer

True

Reason — A Series object in Pandas can store only homogeneous elements, meaning all elements must be of the same data type.

Question 12

A DataFrame object can store only homogeneous elements.

Answer

False

Reason — A DataFrame object can store heterogeneous elements, meaning it can have elements of different data types.

Question 13

The del statement can remove the rows as well as columns in a dataframe.

Answer

False

Reason — The del statement is used to delete columns in a DataFrame, while the drop() function is used to delete rows from a DataFrame.

Question 14

The rename() always makes changes in the default dataframe.

Answer

False

Reason — If the inplace argument in the rename() function is set to True, then it makes changes in the default DataFrame. If it is set to False, then it returns a new DataFrame with the changes applied.

Assertions and Reasons

Question 1

Assertion (A). To use the Pandas library in a Python program, one must import it.

Reasoning (R). The only alias name that can be used with the Pandas library is pd.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is true but R is false.

Explanation
In order to work with Pandas in Python, we need to import the Pandas library into our Python environment using the statement import pandas as pd. While pd is a common alias used with the Pandas library, it's not the only alias that can be used. We can import Pandas using other alias names as well.

Question 2

Assertion. A series is a 1D data structure which is value-mutable but size-immutable.

Reason. Every time you change the size of a series object, change does not take place in the existing series object, rather a new series object is created with the new size.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
A series is a one-dimensional data structure that is value-mutable but size-immutable. This means that we can modify the values within a series, but we cannot change its size once it's created. Every time we attempt to change the size of a series object by adding or dropping an element, internally a new series object is created with the new size.

Question 3

Assertion. A dataframe is a 2D data structure which is value mutable and size mutable.

Reason. Every change in a dataframe internally creates a new dataframe object.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is true but R is false.

Explanation
A DataFrame is a two-dimensional data structure that is both value-mutable and size-mutable. This means that we can modify the values within a DataFrame, change its size once it's created, and add or drop elements in an existing DataFrame object without creating a new DataFrame internally.

Question 4

Assertion. A dataframe is value mutable and size-mutable.

Reason. All changes occur in-place in a dataframe.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
A DataFrame is a two-dimensional data structure that is both value-mutable and size-mutable. This means that we can modify the values within a DataFrame, change its size in place once it's created, and add or drop elements in an existing DataFrame object without creating a new DataFrame internally.

Question 5

Assertion. A series object stores values of homogeneous types.

Reason. Even if values appear to be of different types, internally they are stored in a common datatype.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
A Series object in Pandas stores values of homogeneous types, meaning all values are of the same data type. Even if values appear to be of different types, internally they are stored in a common datatype.

Question 6

Assertion. Arithmetic operations on two series objects take place on matching indexes.

Reason. Non-matching indexes are removed from the result of arithmetic operation on series objects.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is true but R is false.

Explanation
Arithmetic operations on two Series objects take place on matching indexes. When performing operations on objects with non-matching indexes, Pandas aligns the indexes and adds values for matching indexes, resulting in NaN (Not a Number) for non-matching indexes in both objects.

Question 7

Assertion. Arithmetic operations on two series objects take place on matching indexes.

Reason. For non-matching indexes of series objects in an arithmetic operation, NaN is returned.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
Arithmetic operations on two Series objects take place on matching indexes. When performing operations on objects with non-matching indexes, Pandas aligns the indexes and adds values for matching indexes, resulting in NaN (Not a Number) for non-matching indexes in both objects.

Question 8

Assertion. While changing the values of a column in a dataframe, if the column does not exist, an error occurs.

Reason. If values are provided for a non-existing column in a dataframe, a new column is added with those values.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is false but R is true.

Explanation
While changing the values of a column in a dataframe where the column does not exist does not cause an error. Instead, a new column with those values is added to the dataframe. If values are provided for a non-existing column in a dataframe, a new column is added with those values.

Question 9

Assertion. .loc() is a label based data selecting method to select a specific row(s) or column(s) which we want to select.

Reason. .iloc() can not be used with default indices if customized indices are provided.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

A is true but R is false.

Explanation
The .loc() is a label-based method in Pandas used for selecting specific rows or columns based on their labels (indices). While .iloc() can be used with default indices (0-based integer indices) even if customized indices are provided. .iloc[] is primarily used for integer-location based indexing.

Question 10

Assertion. DataFrame has both a row and column index.

Reason. A DataFrame is a two-dimensional labelled data structure like a table of MySQL.

  1. Both A and R are true and R is the correct explanation of A.
  2. Both A and R are true but R is not the correct explanation of A.
  3. A is true but R is false.
  4. A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
A DataFrame in Pandas has both a row index and a column index. It is a two-dimensional labeled data structure, similar to a table in MySQL, each value is identifiable with the combination of row and column indices.

Type A: Very Short Answer Questions

Question 1

What is the significance of Pandas library ?

Answer

The significance of Python Pandas library is as follows:

  1. It can read or write in many different data formats (integer, float, double, etc.).
  2. It can calculate in all the possible ways data is organized i.e., across rows and down columns.
  3. It can easily select subsets of data from bulky data sets and even combine multiple datasets together. It has functionality to find and fill missing data.
  4. It allows to apply operations to independent groups within the data.
  5. It supports reshaping of data into different forms.
  6. It supports advanced time-series functionality.
  7. It supports visualization by integrating matplotlib and seaborn etc. libraries.

Question 2

Name some common data structures of Python's Pandas library.

Answer

The common data structures of Python's Pandas library are Series and DataFrame.

Question 3

How is a Series object different from and similar to ndarrays ? Support your answer with examples.

Answer

A Series object in Pandas is both similar to and different from ndarrays (NumPy arrays).

Similarities:

Both Series and ndarrays store homogeneous data, meaning all elements must be of the same data type (e.g., integers, floats, strings).

Differences:

Series Objectndarrays
It supports explicit indexing, i.e., we can programmatically choose, provide and change indexes in terms of numbers or labels.It does not support explicit indexing, only supports implicit indexing whereby the indexes are implicitly given 0 onwards.
It supports indexes of numeric as well of string types.It supports indexes of only numeric types.
It can perform vectorized operations on two series objects, even if their shapes are different by using NaN for non-matching indexes/labels.It can perform vectorized operations on two ndarrays only if their shapes match.
It takes more memory compared to a numpy array.It takes lesser memory compared to a Series object.

Question 4

Write single line Pandas statement for the following. (Assuming necessary modules have been imported) :

Declare a Pandas series named Packets having dataset as :

[125, 92, 104, 92, 85, 116, 87, 90]

Answer

Packets = pandas.Series([125, 92, 104, 92, 85, 116, 87, 90], name = 'Packets')

Question 5

Write commands to print following details of a Series object seal :

(a) if the series is empty

(b) indexes of the series

(c) The data type of underlying data

(d) if the series stores any NaN values

Answer

(a)

seal.empty

(b)

seal.index

(c)

seal.dtype

(d)

seal.hasnans

Question 6

Given the following Series S1 and S2 :

S1
A10
B40
C34
D60
S2
A80
B20
C74
D90

Write the command to find the sum of series S1 and S2.

Answer

>>> print(S1 + S2)
Output
A     90
B     60
C    108
D    150
dtype: int64

Question 7

Consider two objects x and y. x is a list whereas y is a Series. Both have values 20, 40, 90, 110.

What will be the output of the following two statements considering that the above objects have been created already ?

(a) print (x*2)

(b) print (y*2)

Justify your answer.

Answer

(a)

Output
[20, 40, 90, 110, 20, 40, 90, 110]

In the first statement, x represents a list. When a list is multiplied by 2 (x*2), it replicates each element of the list twice.

(b)

Output
0     40
1     80
2    180
3    220
dtype: int64

In the second statement, y represents a Series. When a Series is multiplied by a value, each element of the Series is multiplied by 2, as Series supports vectorized operations.

Question 8

Given a dataframe df as shown below :

 ABD
0151719
1161820
2202122

What will be the result of following code statements ?

(a) df['C'] = np.NaN

(b) df['C'] = [2, 5]

(c) df['C'] = [12, 15, 27]

Answer

(a) df['C'] = np.NaN — This statement will add a new column 'C' to the dataframe and assign np.NaN (Not a Number) to all rows in this new column.

The updated dataframe will look like this:

Output
    A   B     D    C
0   15  17   19  NaN
1   16  18   20  NaN
2   20  21   22  NaN

(b) df['C'] = [2, 5] — This statement will result in error because the length of the list [2, 5] does not match the number of rows in the DataFrame df.

(c) df['C'] = [12, 15, 27] — This statement will add a new column 'C' to the dataframe and assign the values from the list [12, 15, 27] to the new column. This time, all rows in the new column will be assigned a value.

The updated dataframe will look like this:

Output
    A   B   D   C
0  15  17  19  12
1  16  18  20  15
2  20  21  22  27

Question 9

Write code statements to list the following, from a dataframe namely sales:

(a) List only columns 'Item' and 'Revenue'.

(b) List rows from 3 to 7.

(c) List the value of cell in 5th row, 'Item' column.

Answer

(a)

>>> sales[['Item', 'Revenue']]

(b)

>>> sales.iloc[2:7]

(c)

>>> sales.Item[4]

Question 10

Hitesh wants to display the last four rows of the dataframe df and has written the following code :

df.tail()

But last 5 rows are being displayed. Identify the error and rewrite the correct code so that last 4 rows get displayed.

Answer

The error in Hitesh's code is that the tail() function in pandas by default returns the last 5 rows of the dataframe. To display the last 4 rows, Hitesh needs to specify the number of rows he wants to display.

Here's the correct code:

df.tail(4)

Question 11

How would you add a new column namely 'val' to a dataframe df that has 10 rows in it and has columns as 'Item', 'Qty', 'Price' ? You can choose to put any values of your choice.

Answer

The syntax to add a new column to a DataFrame is <DF object>.[<column>] = <new value>. Therefore, according to this syntax, the statement to add a column named 'val' to a dataframe df with 10 rows is :

df['val'] = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Question 12

Write code statements for a dataframe df for the following :

(a) delete an existing column from it.

(b) delete rows from 3 to 6 from it.

(c) Check if the dataframe has any missing values.

(d) fill all missing values with 999 in it.

Answer

(a)

>>> del df[<column_name>]

(b)

>>> df.drop(range(2, 6))

(c)

>>> df.isnull()

(d)

>>> df.fillna(999)

Question 13

Write statement(s) to delete a row from a DataFrame.

Answer

The statement to delete a row from a DataFrame is:

<DF>.drop(index).

For example, the statement to delete the second row from a dataframe df is df.drop(1).

Question 14

Write statement(s) to delete a column from a DataFrame.

Answer

The statements to delete a column from a DataFrame is:

del <Df object>[<column name>]

OR

df.drop([<column name>], axis = 1).

For example, the statement to delete a column Population from a dataframe df is del df['Population] or df.drop('Population', axis = 1).

Question 15

Write statement(s) to change the value at 5th row, 6th column in a DataFrame df.

Answer

The statement to change the value at 5th row, 6th column in a DataFrame df is:

df.iat[5, 6] = <new value>.

Question 16

Write statement(s) to change the values to 750 at 4th row to 9th row, 7th column in a DataFrame df.

Answer

The statement to change the value to 750 at 4th row to 9th row, 7th column in a DataFrame df is:

df.iloc[3:9, 6] = 750.

Question 17

What is the difference between iloc and loc with respect to a DataFrame ?

Answer

iloc methodloc method
iloc is used for integer-based indexing.loc is used for label-based indexing.
It allows to access rows and columns using integer indices, where the first row or column has an index of 0.It allows to access rows and columns using their labels (index or column names).
With iloc, the end index/position in slices is excluded when given as start:end.With loc, both the start label and end label are included when given as start:end.
The syntax is df.iloc[row_index, column_index].The syntax is df.loc[row_label, column_label].

Question 18

What is the difference between iat and at with respect to a DataFrame ?

Answer

iat methodat method
iat is used for integer-based indexing.at is used for label-based indexing.
It allows to access a single value in the DataFrame by specifying the row and column indices using integers.It allows to access a single value in the DataFrame by specifying the row and column labels (index or column names).
The syntax is df.iat[row_index, col_index].The syntax is df.at[row_label, col_label].

Question 19

How would you delete columns from a dataframe ?

Answer

To delete columns from a dataframe, we use the del statement with the syntax:

del <Df object>[<column name>]

OR

df.drop([<column name], axis = 1).

For example, the statement to delete columns A, B from a dataframe df is del df['A'] and del df['B'] or df.drop(['A', 'B'], axis = 1).

Question 20

How would you delete rows from a dataframe ?

Answer

To delete rows from a dataframe, we use the drop() function with the syntax:

<DF>.drop(sequence of indexes).

For example, the statement to delete the rows with indexes 2, 3, 4 from a dataframe df is df.drop([2, 3, 4]).

Question 21

Which function would you use to rename the index/column names in a dataframe ?

Answer

The rename() function in pandas is used to rename index or column names in a DataFrame.

Type B: Short Answer Questions/Conceptual Questions

Question 1

Consider following Series object namely S :

0      0.430271                          
1      0.617328                          
2     -0.265421                          
3     -0.836113                              
dtype:float64                                

What will be returned by following statements ?

(a) S * 100

(b) S > 0

(c) S1 = pd.Series(S)

(d) S2 = pd.Series(S1) + 3

What will be the values of Series objects S1 and S2 created above ?

Answer

(a) S * 100

Output
0    43.0271
1    61.7328
2   -26.5421
3   -83.6113
dtype: float64

(b) S > 0

Output
0     True
1     True
2    False
3    False
dtype: bool

(c) S1 = pd.Series(S)

Output
0    0.430271
1    0.617328
2   -0.265421
3   -0.836113
dtype: float64

(d) S2 = pd.Series(S1) + 3

Output
0    3.430271
1    3.617328
2    2.734579
3    2.163887
dtype: float64

The values of Series object S1 created above is as follows:

0    0.430271
1    0.617328
2   -0.265421
3   -0.836113
dtype: float64

The values of Series object S2 created above is as follows:

0    3.430271
1    3.617328
2    2.734579
3    2.163887
dtype: float64

Question 2

Consider the same Series object, S, given in the previous question. What output will be produced by following code fragment ?

S.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']
print(S) 
print(S['AMZN'])
S['AMZN'] = 1.5
print(S['AMZN'])
print(S)

Answer

Output
AMZN    0.430271
AAPL    0.617328
MSFT   -0.265421
GOOG   -0.836113
dtype: float64
0.430271
1.5
AMZN    1.500000
AAPL    0.617328
MSFT   -0.265421
GOOG   -0.836113
dtype: float64
Explanation

The provided code fragment first changes the index labels of the Series S to ['AMZN', 'AAPL', 'MSFT', 'GOOG'], prints the modified Series S, and then proceeds to print and modify the value corresponding to the 'AMZN' index. Specifically, it prints the value at the 'AMZN' index before and after assigning a new value of 1.5 to that index. Finally, it prints the Series S again, showing the updated value at the 'AMZN' index.

Question 3

What will be the output produced by the following code ?

Stationery = ['pencils', 'notebooks', 'scales', 'erasers']
S = pd.Series([20, 33, 52, 10], index = Stationery)
S2 = pd.Series([17, 13, 31, 32], index = Stationery)
print(S + S2)
S = S + S2
print(S + S2)

Answer

Output
pencils      37
notebooks    46
scales       83
erasers      42
dtype: int64
pencils       54
notebooks     59
scales       114
erasers       74
dtype: int64
Explanation

The code creates two Pandas Series, S and S2. It then prints the result of adding these two Series element-wise based on their corresponding indices. After updating S by adding S and S2, it prints the result of adding updated S and S2 again.

Question 4

What will be the output produced by following code, considering the Series object S given above ?

(a) print(S[1:1])

(b) print(S[0:1])

(c) print(S[0:2])

(d)

S[0:2] = 12   
print(S)  

(e)

print(S.index)    
print(S.values)  

Answer

(a)

Output
Series([], dtype: int64)
Explanation

The slice S[1:1] starts at index 1 and ends at index 1, but because the end index is exclusive, it does not include any elements, resulting in an empty Series.

(b)

Output
pencils    20
dtype: int64
Explanation

The slice S[0:1] starts at index 0 and ends at index 1, but because the end index is exclusive, it includes only one element i.e., the element at index 0.

(c)

Output
pencils      20
notebooks    33
dtype: int64
Explanation

The slice S[0:2] starts at index 0 and ends at index 1, hence, it includes two elements i.e., elements from index 0 and 1.

(d)

Output
pencils      12
notebooks    12
scales       52
erasers      10
dtype: int64
Explanation

The slice S[0:2] = 12 assigns the value 12 to indices 0 and 1 in Series S, directly modifying those elements. The updated Series is then printed.

(e)

Output
Index(['pencils', 'notebooks', 'scales', 'erasers'], dtype = 'object')
[20 33 52 10]
Explanation

The code print(S.index) displays the indices of Series S, while print(S.values) displays the values of Series.

Question 5

Write a Python program to create a series object, country using a list that stores the capital of each country.

Note. Assume four countries to be used as index of the series object are India, UK, Denmark and Thailand having their capitals as New Delhi, London, Copenhagen and Bangkok respectively.

Solution
import pandas as pd
capitals = ['New Delhi', 'London', 'Copenhagen', 'Bangkok']
countries = ['India', 'UK', 'Denmark', 'Thailand']
country = pd.Series(capitals, index=countries)
print(country)
Output
India        New Delhi
UK              London
Denmark     Copenhagen
Thailand       Bangkok
dtype: object

Question 6(a)

Find the error in following code fragment :

S2 = pd.Series([101, 102, 102, 104])   
print(S2.index)  
S2.index = [0, 1, 2, 3, 4, 5]  
S2[5] = 220  
print(S2)  

Answer

S2 = pd.Series([101, 102, 102, 104])  
print(S2.index)  
S2.index = [0, 1, 2, 3, 4, 5]  #Error 1
S2[5] = 220  
print(S2)  

Error 1 — The Series S2 initially has four elements, so assigning a new index list of six elements ([0, 1, 2, 3, 4, 5]) to S2.index will raise a ValueError because the new index list length does not match the length of the Series.

The corrected code is:

S2 = pd.Series([101, 102, 102, 104])
print(S2.index)
S2.index = [0, 1, 2, 3]
S2[5] = 220
print(S2)

Question 6(b)

Find the error in following code fragment :

S = pd.Series(2, 3, 4, 5, index = range(4)) 

Answer

In the above code fragment, the data values should be enclosed in square brackets [] to form a list.

The corrected code is:

S = pd.Series([2, 3, 4, 5], index = range(4))

Question 6(c)

Find the error in following code fragment

S1 = pd.Series(1, 2, 3, 4, index = range(7))  

Answer

In the above code fragment, the data values should be enclosed in square brackets to form a list and the specified index range range(7) is out of range for the provided data [1, 2, 3, 4]. Since there are only four data values, the index should have a length that matches the number of data values.

The corrected code is:

S1 = pd.Series([1, 2, 3, 4], index = range(4))

Question 6(d)

Find the error in following code fragment :

S2 = pd.Series([1, 2, 3, 4], index = range(4))

Answer

There is no error in the above code.

Question 7

Find the Error :

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])  
s = pd.Series(data, index = [100, 101, 102, 103, 104, 105])   
print(s[102, 103, 104] )  

Can you correct the error ?

Answer

The error in the above code is in the line print(s[102, 103, 104]). When accessing elements in a pandas Series using square brackets, we should use a list of index values, not multiple separate index values separated by commas.

The corrected code is:

data = np.array(['a', 'b', 'c', 'd', 'e', 'f'])  
s = pd.Series(data, index = [100, 101, 102, 103, 104, 105])  
print(s[[102, 103, 104]]) 

Question 8

Why does following code cause error ?

s1 = pd.Series(range(1, 15, 3), index = list('abcd'))

Answer

The code causes an error because the length of the data (range(1, 15, 3)) and the length of the index (list('abcd')) do not match. The range(1, 15, 3) generates the sequence [1, 4, 7, 10, 13], which has a length of 5. The list('abcd') generates the list ['a', 'b', 'c', 'd'], which has a length of 4. When creating a pandas Series, the length of the data and the length of the index must be the same.

Question 9

Why does following code cause error ?

s1 = pd.Series(range(1, 15, 3), index = list('ababa')) 
print(s1['ab'])

Answer

The statement s1['ab'] causes an Error because 'ab' is not a single key in the index. The index has individual keys 'a' and 'b', but not 'ab'.

Question 10

If Ser is a Series type object having 30 values, then how are statements (a), (b) and (c), (d) similar and different ?

(a) print(Ser.head())

(b) print(Ser.head(8))

(c) print(Ser.tail())

(d) print(Ser.tail(11))

Answer

The statements (a), (b), (c) and (d) are all used to view the values from a pandas Series object Ser. However, they differ in the number of values they display.

(a) print(Ser.head()): This statement will display the first 5 values from the Series Ser.

(b) print(Ser.head(8)): This statement will display the first 8 values from the Series Ser.

(c) print(Ser.tail()): This statement will display the last 5 values from the Series Ser.

(d) print(Ser.tail(11)): This statement will display the last 11 values from the Series Ser.

Question 11

What advantages does dataframe offer over series data structure ? If you have similar data stored in multiple series and a single dataframe, which one would you prefer and why ?

Answer

The advantages of using a DataFrame over a Series are as follows:

  1. A DataFrame can have multiple columns, whereas a Series can only have one.
  2. A DataFrame can store data of different types in different columns, whereas a Series can only store data of a single type.
  3. A DataFrame allows to perform operations on entire columns, whereas a Series only allows to perform operations on individual elements.
  4. A DataFrame allows to index data using both row and column labels, whereas a Series only allows to index data using a single label.

If there is similar data stored in multiple Series and a single DataFrame, I would prefer to use the DataFrame. This is because a DataFrame allows us to store and manipulate data in a more organized and structured way, and it allows us to perform operations on entire columns. Additionally, a DataFrame allows us to index data using both row and column labels, which makes it easier to access and manipulate data.

Question 12

Create a DataFrame in Python from the given list :

[['Divya', 'HR', 95000], ['Mamta', 'Marketing', 97000], ['Payal', 'IT', 980000], ['Deepak', 'Sales', 79000]]

Also give appropriate column headings as shown below :

 NameDepartmentSalary
0DivyaHR95000
1MamtaMarketing97000
2PayalIT980000
3DeepakSales79000
Solution
import pandas as pd
data = [['Divya', 'HR', 95000], ['Mamta', 'Marketing', 97000], ['Payal', 'IT', 980000], ['Deepak', 'Sales', 79000]]
df = pd.DataFrame(data, columns=['Name', 'Department', 'Salary'])
print(df)
Output
    Name Department  Salary
0   Divya         HR   95000
1   Mamta  Marketing   97000
2   Payal         IT  980000
3  Deepak      Sales   79000

Question 13

Carefully observe the following code :

import pandas as pd
Year1 = {'Q1': 5000, 'Q2': 8000, 'Q3': 12000, 'Q4': 18000} 
Year2 = {'A': 13000, 'B': 14000, 'C': 12000}
totSales = {1: Year1, 2: Year2}
df = pd.DataFrame(totSales)
print(df)

Answer the following :

(i) List the index of the DataFrame df.

(ii) List the column names of DataFrame df.

Answer

(i) The index of the DataFrame df is: ['Q1', 'Q2', 'Q3', 'Q4', 'A', 'B', 'C'].

(ii) The column names of the DataFrame df are: [1, 2].

Question 14

Given :

import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df) 
print(df1) 
print(df2)

What will Python show the result as if you execute above code ?

Answer

Output
  one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
   one  two
d  NaN  4.0
b  2.0  2.0
a  1.0  1.0
   two three
d  4.0   NaN
a  1.0   NaN
Explanation

The given code creates three pandas DataFrames df, df1, and df2 using the same dictionary d with different index and column labels. The first DataFrame df is created using the dictionary d with index labels taken from the index of the Series objects in the dictionary. The resulting DataFrame has two columns 'one' and 'two' with index labels 'a', 'b', 'c', and 'd'. The values in the DataFrame are filled in accordance to the index and column labels. The second DataFrame df1 is created with the same dictionary d but with a custom index ['d', 'b', 'a']. The third DataFrame df2 is created with a custom index ['d', 'a'] and a custom column label ['two', 'three']. Since the dictionary d does not have a column label three, all its values are NaN (Not a Number), indicating missing data.

Question 15(a)

From the DataFrames created in previous question, write code to display only row 'a' from DataFrames df, df1, and df2.

Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.loc['a',:])
print(df1.loc['a',:])
print(df2.loc['a',:])
Output
one    1.0
two    1.0
Name: a, dtype: float64
one    1.0
two    1.0
Name: a, dtype: float64
two      1.0
three    NaN
Name: a, dtype: object

Question 15(b)

From the DataFrames created in previous question, write code to display only rows 0 and 1 from DataFrames df, df1, and df2.

Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.iloc[0:2])
print(df1.iloc[0:2])
print(df2.iloc[0:2])
Output
   one  two
a  1.0  1.0
b  2.0  2.0
   one  two
d  NaN  4.0
b  2.0  2.0
   two three
d  4.0   NaN
a  1.0   NaN

Question 15(c)

From the DataFrames created in previous question, write code to display only rows 'a' and 'b' for columns 1 and 2 from DataFrames df, df1 and df2.

Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
print(df.loc['a' : 'b', :])
print(df1.loc['b' : 'a', :])
print(df2.loc['d' : 'a', :])
Output
   one  two
a  1.0  1.0
b  2.0  2.0
   one  two
b  2.0  2.0
a  1.0  1.0
   two three
d  4.0   NaN
a  1.0   NaN

Question 15(d)

From the DataFrames created in previous question, write code to add an empty column 'x' to all DataFrames.

Solution
import pandas as pd
d = {'one' : pd.Series([1., 2., 3.], index = ['a', 'b', 'c']), 'two' : pd.Series([1., 2., 3., 4.], index = ['a', 'b', 'c', 'd'])} 
df = pd.DataFrame(d)
df1 = pd.DataFrame(d, index = ['d', 'b', 'a'])
df2 = pd.DataFrame(d, index = ['d', 'a'], columns = ['two', 'three'])
df['x'] = None
df1['x'] = None
df2['x'] = None
print(df)
print(df1)
print(df2)
Output
   one  two     x
a  1.0  1.0  None
b  2.0  2.0  None
c  3.0  3.0  None
d  NaN  4.0  None
   one  two     x
d  NaN  4.0  None
b  2.0  2.0  None
a  1.0  1.0  None
   two three     x
d  4.0   NaN  None
a  1.0   NaN  None

Question 16

What will be the output of the following program ?

import pandas as pd
dic = {'Name' : ['Sapna', 'Anmol', 'Rishul', 'Sameep'], 'Agg' : [56, 67, 75, 76], 'Age' : [16, 18, 16, 19]}
df = pd.DataFrame(dic, columns = ['Name', 'Age'])
print(df)

(a)

    Name   Agg Age
101 Sapna  56  16
102 Anmol  67  18
103 Rishul 75  16
104 Sameep 76  19

(b)

    Name   Agg   Age
0   Sapna  56    16
1   Anmol  67    18
2   Rishul  75   16
3   Sameep  76   19

(c)

    Name 
0   Sapna 
1   Anmol 
2   Rishul
3   Sameep 

(d)

    Name   Age
0   Sapna  16
1   Anmol  18
2   Rishul  16
3   Sameep  19

Answer

(d)

Output
     Name  Age
0   Sapna   16
1    Amol   18
2  Rishul   16
3  Sameep   19
Explanation

The code creates a DataFrame df with columns 'Name' and 'Age' using a dictionary. It contains data about individual's names and ages. The DataFrame is then printed, displaying the specified columns.

Question 17

Predict the output of following code (it uses below given dictionary my_di).

my_di = {"name" : ["Jiya", "Tim", "Rohan"],
         "age" : np.array([10, 15, 20]),
         "weight" : (75, 123, 239),
         "height" : [4.5, 5, 6.1],
         "siblings" : 1,
         "gender" : "M"}
df = pd.DataFrame(my_di)
print(df)

Answer

Output
    name  age  weight  height  siblings gender
0   Jiya   10      75     4.5         1      M
1    Tim   15     123     5.0         1      M
2  Rohan   20     239     6.1         1      M
Explanation

The given code creates a dictionary my_di. Then, a DataFrame df is created using the pd.DataFrame() constructor and passing the my_di dictionary. The print() function is used to display the DataFrame.

Question 18

Consider the same dictionary my_di in the previous question (shown below), what will be the output produced by following code ?

my_di = {"name" : ["Jiya", "Tim", "Rohan"],
         "age" : np.array([10, 15, 20]),
         "weight" : (75, 123, 239),
         "height" : [4.5, 5, 6.1],
         "siblings" : 1,
         "gender" : "M"}
df2 = pd.DataFrame(my_di, index = my_di["name"])
print(df2)

Answer

Output
       name  age  weight  height  siblings gender
Jiya    Jiya   10      75     4.5         1      M
Tim      Tim   15     123     5.0         1      M
Rohan  Rohan   20     239     6.1         1      M
Explanation

The given code creates a dictionary my_di. Then, a DataFrame df2 is created using the pd.DataFrame() constructor and passing the my_di dictionary and the my_di["name"] list as the index. The print() function is used to display the DataFrame.

Question 19

Assume that required libraries (panda and numpy) are imported and dataframe df2 has been created as per questions 17 and 18 above. Predict the output of following code fragment :

print(df2["weight"])
print(df2.weight['Tim'])

Answer

Output
Jiya      75
Tim      123
Rohan    239
Name: weight, dtype: int64
123
Explanation

The given code creates a dictionary my_di. Then, a DataFrame df2 is created using the pd.DataFrame() constructor and passing the my_di dictionary and the my_di["name"] list as the index. The print() function is used to display the 'weight' column of the DataFrame df2 and the value of the 'weight' column for the row with index 'Tim'.

Question 20

Assume that required libraries (panda and numpy) are imported and dataframe df2 has been created as per questions 17 and 18 above. Predict the output of following code fragment :

df2["IQ"] = [130, 105, 115] 
df2["Married"] = False
print(df2)

Answer

Output
        name  age  weight  height  siblings gender   IQ  Married
Jiya    Jiya   10      75     4.5         1      M  130    False
Tim      Tim   15     123     5.0         1      M  105    False
Rohan  Rohan   20     239     6.1         1      M  115    False
Explanation

The code adds two new columns "IQ" with values [130, 105, 115] and "Married" with value "False" for all rows to DataFrame df2, then prints the DataFrame.

Question 21

Assume that required libraries (panda and numpy) are imported and dataframe df2 has been created as per questions 17 and 18 above. Predict the output produced by following code fragment :

df2["College"] = pd.Series(["IIT"], index=["Rohan"]) 
print(df2)

Answer

Output
        name  age  weight  height  siblings gender College
Jiya    Jiya   10      75     4.5         1      M     NaN
Tim      Tim   15     123     5.0         1      M     NaN
Rohan  Rohan   20     239     6.1         1      M     IIT
Explanation

The code snippet uses the pandas and numpy libraries in Python to create a DataFrame named df2 from a dictionary my_di. The DataFrame is indexed by names, and a new column "College" is added with "IIT" as the value only for the index named "Rohan."

Question 22

Assume that required libraries (panda and numpy) are imported and dataframe df2 has been created as per questions 17 and 18 above. Predict the output produced by following code fragment :

print(df2.loc["Jiya"])
print(df2.loc["Jiya", "IQ"])
print(df2.loc["Jiya":"Tim", "IQ":"College"]) 
print(df2.iloc[0])
print(df2.iloc[0, 5])
print(df2.iloc[0:2, 5:8])

Answer

Output
name        Jiya
age           10
weight        75
height       4.5
siblings       1
gender         M
IQ           130
College      NaN
Name: Jiya, dtype: object
130
       IQ College
Jiya  130     NaN
Tim   105     NaN
name        Jiya
age           10
weight        75
height       4.5
siblings       1
gender         M
IQ           130
College      NaN
Name: Jiya, dtype: object
M
     gender   IQ College
Jiya      M  130     NaN
Tim       M  105     NaN
Explanation
  1. print(df2.loc["Jiya"]) — This line prints all columns of the row with the index "Jiya".
  2. print(df2.loc["Jiya", "IQ"]) — This line prints the value of the "IQ" column for the row with the index "Jiya".
  3. print(df2.loc["Jiya":"Tim", "IQ":"College"]) — This line prints a subset of rows and columns using labels, from "Jiya" to "Tim" for rows and from "IQ" to "College" for columns.
  4. print(df2.iloc[0]) — This line prints all columns of the first row using integer-based indexing (position 0).
  5. print(df2.iloc[0, 5]) — This line prints the value of the 6th column for the first row using integer-based indexing.
  6. print(df2.iloc[0:2, 5:8]) — This line prints a subset of rows and columns using integer-based indexing, selecting rows from position 0 to 1 and columns from position 5 to 7.

Question 23

What is the output of the following code ?

d = {'col1': [1, 4, 3 ], 'col2': [6, 7, 8], 'col3': [9, 0, 1]}
df = pd.DataFrame(d)
print("Original DataFrame")
print(df)
print("New DataFrame :")
dfn = df.drop(df.index[[1, 2]])
print(dfn)

Answer

Output
Original DataFrame
   col1  col2  col3
0     1     6     9
1     4     7     0
2     3     8     1
New DataFrame :
   col1  col2  col3
0     1     6     9
Explanation

The code creates a DataFrame using the pandas library in Python, named df, with three columns ('col1', 'col2', 'col3') and three rows of data. The DataFrame df is printed, and then a new DataFrame named dfn is created by dropping the rows with indices 1 and 2 from the original DataFrame using df.drop(df.index[[1, 2]]). The resulting DataFrame, dfn, contains only the first row from the df DataFrame, removing rows 2 and 3.

Question 24

What is the output of the following code ?

data = {'age': [20, 23, 22], 'name': ['Ruhi', 'Ali', 'Sam']} 
df1 = pd.DataFrame(data, index=[1, 2, 3])
print("Before")
print(df1)
df1['Edu'] = ['BA', 'BE' , 'MBA']
print('After')
print(dfl)

Answer

Output
Before
   age  name
1   20  Ruhi
2   23   Ali
3   22   Sam
After
   age  name  Edu
1   20  Ruhi   BA
2   23   Ali   BE
3   22   Sam  MBA
Explanation

The code utilizes the pandas library in Python to create a DataFrame named df1 using a dictionary data. The df1 DataFrame is printed, showing the initial data. Then, a new column 'Edu' is added to the DataFrame using df1['Edu'] = ['BA', 'BE' , 'MBA']. The updated DataFrame is printed.

Question 25

Consider the given DataFrame 'Genre' :

NoTypeCode
0FictionF
1Non-fictionNF
2DramaD
3PoetryP

Write suitable Python statements for the following :

(i) Add a column called Num_Copies with the following data : [300, 290, 450, 760].

(ii) Add a new genre of type 'Folk Tale' having code as "FT" and 600 number of copies.

(iii) Rename the column 'Code' to 'Book_Code'.

Answer

(i)

Genre['Num_Copies'] = [300, 290, 450, 760]

(ii)

Genre = Genre.append({'Type': 'Folk Tale', 'Code': 'FT', 'Num_Copies': 600}, ignore_index=True)

(iii)

Genre.rename(columns = {'Code': 'Book_Code'}, inplace = True)

Question 26

Write a program in Python Pandas to create the following DataFrame batsman from a Dictionary :

B_NONameScore1Score2
1Sunil Pillai9080
2Gaurav Sharma6545
3Piyush Goel7090
4Karthik Thakur8076

Perform the following operations on the DataFrame :

(i) Add both the scores of a batsman and assign to column "Total".

(ii) Display the highest score in both Score1 and Score2 of the DataFrame.

(iii) Display the DataFrame.

Answer

import pandas as pd
data = {'B_NO': [1, 2, 3, 4], 'Name': ['Sunil Pillai', 'Gaurav Sharma', 'Piyush Goel', 'Karthik Thakur'], 'Score1': [90, 65, 70, 80], 'Score2': [80, 45, 90, 76]}
batsman = pd.DataFrame(data)
batsman['Total'] = batsman['Score1'] + batsman['Score2']
highest_score1 = batsman['Score1'].max()
highest_score2 = batsman['Score2'].max()
print("Highest score in Score1: ", highest_score1)
print("Highest score in Score2: ", highest_score2)
print(batsman)
Output
Highest score in Score1:  90
Highest score in Score2:  90
   B_NO            Name  Score1  Score2  Total
0     1    Sunil Pillai      90      80    170
1     2   Gaurav Sharma      65      45    110
2     3     Piyush Goel      70      90    160
3     4  Karthik Thakur      80      76    156

Question 27

Consider the following dataframe, and answer the questions given below:

import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],  
"Quarter2": [5800, 2500, 5400, 3000, 2900],  
"Quarter3": [20000, 16000, 7000, 3600, 8200],  
"Quarter4": [1400, 3700, 1700, 2000, 6000]})

(i) Write the code to find mean value from above dataframe df over the index and column axis.

(ii) Use sum() function to find the sum of all the values over the index axis.

Answer

(i)

import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],  
"Quarter2": [5800, 2500, 5400, 3000, 2900],  
"Quarter3": [20000, 16000, 7000, 3600, 8200],  
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
mean_over_columns = df.sum(axis=1) / df.count(axis=1)
print("Mean over columns: \n", mean_over_columns)

mean_over_rows = df.sum(axis=0) / df.count(axis=0)
print("Mean over rows: \n", mean_over_rows)
Output
Mean over columns: 
0    7300.0
1    6550.0
2    4775.0
3    3250.0
4    6775.0
dtype: float64

Mean over rows:
Quarter1     5080.0
Quarter2     3920.0
Quarter3    10960.0
Quarter4     2960.0
dtype: float64

(ii)

import pandas as pd
df = pd.DataFrame( { "Quarter1": [2000, 4000, 5000, 4400, 10000],  
"Quarter2": [5800, 2500, 5400, 3000, 2900],  
"Quarter3": [20000, 16000, 7000, 3600, 8200],  
"Quarter4": [1400, 3700, 1700, 2000, 6000]})
sum_over_index = df.sum(axis=0)
print("Sum over index (columns):\n", sum_over_index)
Output
Sum over index (columns):
Quarter1    25400
Quarter2    19600
Quarter3    54800
Quarter4    14800
dtype: int64

Question 28

Write the use of the rename(mapper = <dict-like>, axis = 1) method for a Pandas Dataframe. Can the mapper and columns parameter be used together in a rename() method ?

Answer

The rename() method in pandas DataFrame is used to alter the names of columns or rows. It accepts various parameters, including mapper and axis, which can be used together to rename columns and rows based on a mapping dictionary. The mapper parameter allows for a dict-like object mapping old names to new names, while axis specifies whether the renaming should occur along columns (axis=1) or rows (axis=0).

Yes, the mapper parameter and the columns parameter can be used together in the rename() method of a pandas DataFrame to rename columns. The mapper parameter is used to rename columns based on a mapping dictionary where keys represent the old column names and values represent the new column names. The columns parameter allows us to directly specify new column names without using a mapping dictionary. With columns, we provide a list-like input containing the new column names, and pandas will rename the columns accordingly.

Question 29

Find the error in the following code ? Suggest the solution.

>>> topDf
        RollNo   Name    Marks
Sec A   115      Pavni   97.5
Sec B   236      Rishi   98.0
Sec C   307      Preet   98.5
Sec D   422      Paula   98.0
topDf.del['Sec D']

Answer

The error in the code is that topDf.del['Sec D'] is not the correct syntax to delete a row from a DataFrame in pandas. The correct syntax to delete a row in pandas is using the drop() method along with specifying the index label or index position of the row to be deleted.

The corrected code is:

>>> topDf.drop(['Sec D'])
Output
       RollNo   Name  Marks
Sec A     115  Pavni   97.5
Sec B     236  Rishi   98.0
Sec C     307  Preet   98.5

Question 30

Find the error in the following code considering the same dataframe topDf given in the previous question.

(i) topDf.rename(index=['a', 'b', 'c', 'd'])

(ii) topDf.rename(columns = {})

Answer

(i) The line topDf.rename(index=['a', 'b', 'c', 'd']) attempts to rename the index of the DataFrame topDf, but it doesn't assign the modified DataFrame back to topDf or use the inplace = True parameter to modify topDf directly. Additionally, using a list of new index labels without specifying the current index labels will result in an error.

The corrected code is:

topDf.rename(index={'Sec A': 'a', 'Sec B': 'b', 'Sec C': 'c', 'Sec D': 'd'}, inplace = True)

(ii) The line topDf.rename(columns={}) attempts to rename columns in the DataFrame topDf, but it provides an empty dictionary {} for renaming, which will not perform any renaming. We need to provide a mapping dictionary with old column names as keys and new column names as values. To modify topDf directly, it should use the inplace = True parameter.

The corrected code is:

topDf.rename(columns={'RollNo': 'NewRollNo', 'Name': 'NewName', 'Marks': 'NewMarks'}, inplace = True)

Type C: Long Answer Questions

Question 1

Write Python code to create a Series object Temp1 that stores temperatures of seven days in it. Take any random seven temperatures.

Solution
import pandas as pd
temperatures = [28.0, 30.4, 26.5, 29.4, 27.0, 31.2, 25.8]
Temp1 = pd.Series(temperatures)
print(Temp1)
Output
0    28.0
1    30.4
2    26.5
3    29.4
4    27.0
5    31.2
6    25.8
dtype: float64

Question 2

Write Python code to create a Series object Temp2 storing temperatures of seven days of week. Its indexes should be 'Sunday', 'Monday',... 'Saturday'.

Solution
import pandas as pd
temperatures = [28.9, 30.1, 26.2, 29.3, 27.5, 31.9, 25.5]
days_of_week = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
Temp2 = pd.Series(temperatures, index = days_of_week)
print(Temp2)
Output
Sunday       28.9
Monday       30.1
Tuesday      26.2
Wednesday    29.3
Thursday     27.5
Friday       31.9
Saturday     25.5
dtype: float64

Question 3

A series object (say T1) stores the average temperature recorded on each day of a month. Write code to display the temperatures recorded on :

(i) first 7 days

(ii) last 7 days.

Solution
import pandas as pd
T1 = pd.Series([25.6, 26.3, 27.9, 28.2, 29.1, 30.9, 31.2, 32.4, 33.2, 34.4, 33.3, 32.5, 31.4, 30.7, 29.6, 28.9, 27.0, 26.2, 25.32, 24.34, 23.4, 22.3, 21.6, 20.9, 19.8, 18.1, 17.2, 16.34, 15.5, 14.6])
first_7_days = T1.head(7)
print("Temperatures recorded on the first 7 days:")
print(first_7_days)

last_7_days = T1.tail(7)
print("\nTemperatures recorded on the last 7 days:")
print(last_7_days)
Output
Temperatures recorded on the first 7 days:
0    25.6
1    26.3
2    27.9
3    28.2
4    29.1
5    30.9
6    31.2
dtype: float64

Temperatures recorded on the last 7 days:
23    20.90
24    19.80
25    18.10
26    17.20
27    16.34
28    15.50
29    14.60
dtype: float64

Question 4

Series objects Temp1, Temp2, Temp3, Temp4 store the temperatures of days of week1, week2, week3, week4 respectively.

Write a script to

(a) print the average temperature per week.

(b) print average temperature of entire month.

Solution
import pandas as pd
Temp1 = pd.Series([28.0, 30.2, 26.1, 29.6, 27.7, 31.8, 25.9])  
Temp2 = pd.Series([25.5, 24.5, 23.6, 22.7, 21.8, 20.3, 19.2])  
Temp3 = pd.Series([32.4, 33.3, 34.1, 33.2, 32.4, 31.6, 30.9]) 
Temp4 = pd.Series([27.3, 28.1, 29.8, 30.6, 31.7, 32.8, 33.0]) 

Week_1 = sum(Temp1)
Week_2 = sum(Temp2)
Week_3 = sum(Temp3)
Week_4 = sum(Temp4)

print("Week 1 : Average Temperature is", Week_1 / 7, "degree Celsius")
print("Week 2 : Average Temperature is", Week_2 / 7, "degree Celsius")
print("Week 3 : Average Temperature is", Week_3 / 7, "degree Celsius")
print("Week 4 : Average Temperature is", Week_4 / 7, "degree Celsius")

total = Week_1 + Week_2 + Week_3 + Week_4
print("\nAverage temperature of entire month:", total / 28, "degree Celsius")
Output
Week 1 : Average Temperature is 28.47142857142857 degree Celsius
Week 2 : Average Temperature is 22.514285714285712 degree Celsius
Week 3 : Average Temperature is 32.55714285714286 degree Celsius
Week 4 : Average Temperature is 30.47142857142857 degree Celsius

Average temperature of entire month: 28.503571428571426 degree Celsius

Question 5

Ekam, a Data Analyst with a multinational brand has designed the DataFrame df that contains the four quarters' sales data of different stores as shown below :

 StoreQtr1Qtr2Qtr3Qtr4
0Store1300240450230
1Store2350340403210
2Store3250180145160

Answer the following questions :

(i) Predict the output of the following Python statement :

(a) print(df.size)  
(b) print(df[1:3])

(ii) Delete the last row from the DataFrame.

(iii) Write Python statement to add a new column Total_Sales which is the addition of all the 4 quarter sales.

Answer

(i)

(a) print(df.size)

Output
15
Explanation

The size attribute of a DataFrame returns the total number of elements in the DataFrame df.

(b) print(df[1:3])

Output
    Store  Qtr1  Qtr2  Qtr3  Qtr4
1  Store2   350   340   403   210
2  Store3   250   180   145   160
Explanation

This statement uses slicing to extract rows 1 and 2 from the DataFrame df.

(ii)

df = df.drop(2)
Output
    Store  Qtr1  Qtr2  Qtr3  Qtr4
0  Store1   300   240   450   230
1  Store2   350   340   403   210

(iii)

df['Total_Sales'] = df['Qtr1'] + df['Qtr2'] + df['Qtr3'] + df['Qtr4']
Output
    Store  Qtr1  Qtr2  Qtr3  Qtr4  Total_Sales
0  Store1   300   240   450   230         1220
1  Store2   350   340   403   210         1303
2  Store3   250   180   145   160          735

Question 6(i)

Consider the following DataFrame df and answer any four questions from (i)-(v):

rollnonameUT1UT2UT3UT4
1Prerna Singh24242022
2Manish Arora18171922
3Tanish Goel20221824
4Falguni Jain22202420
5Kanika Bhatnagar15201822
6Ramandeep Kaur20152224

Write down the command that will give the following output :

roll no 6
name    Tanish Goel
UT1     24
UT2     24
UT3     24
UT4     24
dtype : object

(a) print(df.max)

(b) print(df.max())

(c) print(df.max(axis = 1))

(d) print(df.max, axis = 1)

Answer

print(df.max())

Explanation

The df.max() function in pandas is used to find the maximum value in each column of a DataFrame.

Question 6(ii)

Consider the following DataFrame df and answer any four questions from (i)-(v):

rollnonameUT1UT2UT3UT4
1Prerna Singh24242022
2Manish Arora18171922
3Tanish Goel20221824
4Falguni Jain22202420
5Kanika Bhatnagar15201822
6Ramandeep Kaur20152224

The teacher needs to know the marks scored by the student with roll number 4. Help her identify the correct set of statement/s from the given options:

(a) df1 = df[df['rollno'] == 4]
print(df1)

(b) df1 = df[rollno == 4]
print(df1)

(c) df1 = df.[df.rollno = 4]
print(df1)

(d) df1 = df[df.rollno == 4]
print(df1)

Answer

df1 = df[df.rollno == 4] print(df1)

Explanation

The statement df1 = df[df.rollno == 4] filters the DataFrame df to include only the rows where the roll number is equal to 4. This is accomplished using boolean indexing, where a boolean mask is created by checking if each row's rollno is equal to 4. Rows that satisfy this condition (True in the boolean mask) are selected, while others are excluded. The resulting DataFrame df1 contains only the rows corresponding to roll number 4 from the original DataFrame df.

Question 6(iii)

Consider the following DataFrame df and answer any four questions from (i)-(v):

rollnonameUT1UT2UT3UT4
1Prerna Singh24242022
2Manish Arora18171922
3Tanish Goel20221824
4Falguni Jain22202420
5Kanika Bhatnagar15201822
6Ramandeep Kaur20152224

Which of the following statement/s will give the exact number of values in each column of the dataframe ?

(I) print(df.count())
(II) print(df.count(0))
(III) print(df.count)
(IV) print((df.count(axis = 'index')))

Choose the correct option :

(a) both (I) and (II)

(b) only (II)

(c) (I), (II) and (III)

(d) (I), (II) and (IV)

Answer

(I), (II) and (IV)

Explanation

In pandas, the statement df.count() and df.count(0) calculate the number of non-null values in each column of the DataFrame df. The statement df.count(axis='index') specifies the axis parameter as 'index', which is equivalent to specifying axis=0. This means it will count non-null values in each column of the DataFrame df.

Question 6(iv)

Consider the following DataFrame df and answer any four questions from (i)-(v):

rollnonameUT1UT2UT3UT4
1Prerna Singh24242022
2Manish Arora18171922
3Tanish Goel20221824
4Falguni Jain22202420
5Kanika Bhatnagar15201822
6Ramandeep Kaur20152224

Which of the following command will display the column labels of the DataFrame ?

(a) print(df.columns())

(b) print(df.column())

(c) print(df.column)

(d) print(df.columns)

Answer

print(df.columns)

Explanation

The statement df.columns is used to access the column labels (names) of a DataFrame in pandas.

Question 6(v)

Consider the following DataFrame df and answer any four questions from (i)-(v):

rollnonameUT1UT2UT3UT4
1Prerna Singh24242022
2Manish Arora18171922
3Tanish Goel20221824
4Falguni Jain22202420
5Kanika Bhatnagar15201822
6Ramandeep Kaur20152224

Ms. Sharma, the class teacher wants to add a new column, the scores of Grade with the values, 'A', 'B', 'A', 'A', 'B', 'A' , to the DataFrame.

Help her choose the command to do so :

(a) df.column = ['A', 'B', 'A', 'A', 'B', 'A']

(b) df['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']

(c) df.loc['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']

(d) Both (b) and (c) are correct

Answer

df['Grade'] = ['A', 'B', 'A', 'A', 'B', 'A']

Explanation

The statement df['Grade'] specifies that we are creating a new column named 'Grade' in the DataFrame df. The square brackets [] are used to access or create a column in a DataFrame.

Question 7

Write a program that stores the sales of 5 fast moving items of a store for each month in 12 Series objects, i.e., S1 Series object stores sales of these 5 items in 1st month, S2 stores sales of these 5 items in 2nd month, and so on.

The program should display the summary sales report like this :

Total Yearly Sales, item-wise (should display sum of items' sales over the months) 
Maximum sales of item made : <name of item that was sold the maximum in whole year> 
Maximum sales for individual items
Maximum sales of item 1 made : <month in which that item sold the maximum> 
Maximum sales of item 2 made : <month in which that item sold the maximum> 
Maximum sales of item 3 made : <month in which that item sold the maximum> 
Maximum sales of item 4 made : <month in which that item sold the maximum> 
Maximum sales of item 5 made : <month in which that item sold the maximum>
Solution
import pandas as pd
sales_data = {
    'Month_1': pd.Series([300, 250, 200, 150, 350], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_2': pd.Series([380, 210, 220, 180, 320], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_3': pd.Series([320, 270, 230, 200, 380], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_4': pd.Series([310, 260, 210, 190, 360], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_5': pd.Series([290, 240, 220, 170, 340], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_6': pd.Series([300, 250, 400, 160, 350], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_7': pd.Series([310, 260, 230, 180, 370], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_8': pd.Series([320, 270, 240, 190, 380], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_9': pd.Series([330, 280, 250, 200, 400], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_10': pd.Series([340, 290, 260, 510, 420], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_11': pd.Series([350, 300, 270, 220, 440], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5']),
    'Month_12': pd.Series([360, 390, 280, 230, 260], index=['Item_1', 'Item_2', 'Item_3', 'Item_4', 'Item_5'])
}

sales_df = pd.DataFrame(sales_data)
print("Total Yearly Sales, item-wise:")
total_sales = sales_df.sum()
print(total_sales)

t = sales_df.sum(axis=1)
max_sales_item = t.idxmax()
print("\nMaximum sales of item made: ", max_sales_item)

print("\nMaximum sales for individual items:")
for item_num in range(1, 6):
    max_sales_month = None
    max_sales_value = 0
    for month in sales_df.columns:
        if sales_df[month][f'Item_{item_num}'] > max_sales_value:
            max_sales_value = sales_df[month][f'Item_{item_num}']
            max_sales_month = month
    print("Maximum sales of item", item_num, "made: ", max_sales_month)
Output
Total Yearly Sales, item-wise:
Month_1     1250
Month_2     1310
Month_3     1400
Month_4     1330
Month_5     1260
Month_6     1460
Month_7     1350
Month_8     1400
Month_9     1460
Month_10    1820
Month_11    1580
Month_12    1520
dtype: int64

Maximum sales of item made:  Item_5

Maximum sales for individual items:
Maximum sales of item 1 made: Month_2
Maximum sales of item 2 made: Month_12
Maximum sales of item 3 made: Month_6
Maximum sales of item 4 made: Month_10
Maximum sales of item 5 made: Month_11

Question 8

Three Series objects store the marks of 10 students in three terms. Roll numbers of students form the index of these Series objects. The Three Series objects have the same indexes.

Calculate the total weighted marks obtained by students as per following formula :

Final marks = 25% Term 1 + 25% Term 2 + 50% Term 3

Store the Final marks of students in another Series object.

Solution
import pandas as pd
term1 = pd.Series([80, 70, 90, 85, 75, 95, 80, 70, 85, 90], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
term2 = pd.Series([85, 90, 75, 80, 95, 85, 90, 75, 80, 85], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
term3 = pd.Series([90, 85, 95, 90, 80, 85, 95, 90, 85, 90], index=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

final_marks = (term1 * 0.25) + (term2 * 0.25) + (term3 * 0.50)
print(final_marks)
Output
1     86.25
2     82.50
3     88.75
4     86.25
5     82.50
6     87.50
7     90.00
8     81.25
9     83.75
10    88.75
dtype: float64

Question 9

Write code to print all the information about a Series object.

Solution
import pandas as pd
s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
print(s)
s.info()
Output
a    1
b    2
c    3
d    4
dtype: int64
<class 'pandas.core.series.Series'>
Index: 4 entries, a to d
Series name: None
Non-Null Count  Dtype
--------------  -----
4 non-null      int64
dtypes: int64(1)
memory usage: 64.0+ bytes

Question 10

Write a program to create three different Series objects from the three columns of a DataFrame df.

Solution
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
s1 = df['A']
s2 = df['B']
s3 = df['C']
print(s1)
print(s2)
print(s3)
Output
0    1
1    2
2    3
Name: A, dtype: int64
0    4
1    5
2    6
Name: B, dtype: int64
0    7
1    8
2    9
Name: C, dtype: int64

Question 11

Write a program to create three different Series objects from the three rows of a DataFrame df.

Solution
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
s1 = df.iloc[0]
s2 = df.iloc[1]
s3 = df.iloc[2]
print(s1)
print(s2)
print(s3)
Output
A    1
B    4
C    7
Name: 0, dtype: int64
A    2
B    5
C    8
Name: 1, dtype: int64
A    3
B    6
C    9
Name: 2, dtype: int64

Question 12

Write a program to create a Series object from an ndarray that stores characters from 'a' to 'g'.

Solution
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
S = pd.Series(data)
print(S)
Output
0    a
1    b
2    c
3    d
4    e
5    f
6    g
dtype: object

Question 13

Write a program to create a Series object that stores the table of number 5.

Solution
import pandas as pd
import numpy as np
arr = np.arange(1, 11)
s = pd.Series(arr * 5)
print(s)
Output
0     5
1    10
2    15
3    20
4    25
5    30
6    35
7    40
8    45
9    50
dtype: int32

Question 14

Write a program to create a Dataframe that stores two columns, which store the Series objects of the previous two questions (12 and 13).

Solution
import pandas as pd
import numpy as np
data = np.array(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
S1 = pd.Series(data)
arr = np.arange(1, 11)
S2 = pd.Series(arr * 5)
df = pd.DataFrame({'Characters': S1, 'Table of 5': S2})

print(df)
Output
  Characters  Table of 5
0          a           5
1          b          10
2          c          15
3          d          20
4          e          25
5          f          30
6          g          35
7        NaN          40
8        NaN          45
9        NaN          50

Question 15

Write a program to create a Dataframe storing salesmen details (name, zone, sales) of five salesmen.

Solution
import pandas as pd
salesmen = {'Name': ['Jahangir', 'Janavi', 'Manik', 'Lakshmi', 'Tanisha'], 'Zone': ['North', 'South', 'East', 'West', 'Central'], 'Sales': [5000, 7000, 3000, 8000, 6000]}

df = pd.DataFrame(salesmen)
print(df)
Output
       Name     Zone  Sales
0  Jahangir    North   5000
1    Janavi    South   7000
2     Manik     East   3000
3   Lakshmi     West   8000
4   Tanisha  Central   6000

Question 16

Four dictionaries store the details of four employees-of-the-month as (empno, name). Write a program to create a dataframe from these.

Solution
import pandas as pd
emp1 = {'empno': 1001, 'name': 'Ameesha'}
emp2 = {'empno': 1002, 'name': 'Akruti'}
emp3 = {'empno': 1003, 'name': 'Prithvi'}
emp4 = {'empno': 1004, 'name': 'Rajesh'}

employees = [emp1, emp2, emp3, emp4]
df = pd.DataFrame(employees)
print(df)
Output
   empno     name
0   1001  Ameesha
1   1002   Akruti
2   1003  Prithvi
3   1004   Rajesh

Question 17

A list stores three dictionaries each storing details, (old price, new price, change). Write a program to create a dataframe from it.

Solution
import pandas as pd
prices = [{'old_price': 10, 'new_price': 12, 'change': 2},
          {'old_price': 20, 'new_price': 18, 'change': -2},
          {'old_price': 30, 'new_price': 35, 'change': 5}]
df = pd.DataFrame(prices)
print(df)
Output
   old_price  new_price  change
0         10         12       2
1         20         18      -2
2         30         35       5