Importing/Exporting Data between CSV Files/MySQL and Pandas

Checkpoint 4.1

Question 1

What is CSV file ? Is it a text file ?

Answer

The acronym CSV is short for Comma-Separated Values, which refers to a tabular data saved as plaintext where data values are separated by commas.

Question 2

Name the function you would use to read data from a CSV file into a DataFrame.

Answer

The function used to read data from a CSV file into a DataFrame is read_csv().

Question 3

Name the function you would use to store data from a DataFrame into a CSV file.

Answer

The function used to store data from a DataFrame into a CSV file is to_csv().

Question 4

Which function lets you fetch data from an SQL table into a DataFrame ?

Answer

The function used to fetch data from an SQL table into a DataFrame is read_sql().

Question 5

Which function lets you store data of a DataFrame into an SQL table ?

Answer

The function used to store data of a DataFrame into an SQL table is to_sql().

Question 6

Which additional argument would you give to append the data of a DataFrame in an existing SQL table ?

Answer

The if_exists = 'append' argument in the to_sql() function of pandas is used to append the data of a DataFrame into the existing SQL table.

Multiple Choice Questions

Question 1

CSV stands for :

Column Separated Value
Class Separated Value
Comma Separated Value
None of the above

Answer

Comma Separated Value

Reason — The acronym CSV is short for Comma-Separated Value.

Question 2

A CSV file can take ............... character as separator.

,
~
|
\t
only (a)
all of these

Answer

all of these

Reason — All of the listed characters can be used as a separator in a CSV file, the default separator is comma.

Question 3

In order to work with CSV files from panda, you need to import ............... , other than pandas.

.csv
pandas.io
newcsv
no extra package required

Answer

no extra package required

Reason — Python's Pandas library offers two functions, read_csv() and to_csv(), that help bring data from a CSV file into a DataFrame and write a DataFrame's data to a CSV file. Therefore, we do not need to import any additional packages or modules other than pandas.

Question 4

The correct statement to read from a CSV file in a DataFrame is :

<DF>.read_csv(<file>)
<File>. read_csv( )(<DF>)
<DF> = pandas.read(<file>)
<DF> = pandas.read_csv(<files>)

Answer

<DF> = pandas.read_csv(<files>)

Reason — The statement to read data from a CSV file into a DataFrame is <DF> = pandas.read_csv(<filepath>).

Question 5

Which argument do you specify with read_csv() to specify a separator character ?

character
char
separator
sep

Answer

sep

Reason — The sep argument in the read_csv() function is used to specify the separator character. By default, this is set to a comma (,).

Question 6

To suppress first row as header, which of the following arguments is to be given in read_csv() ?

noheader = True
header = None
skipheader = True
header = Null

Answer

header = None

Reason — The header = None argument is to be given in read_csv to suppress the first row as the header.

Question 7

To read specific number of rows from a CSV file, which argument is to be given in read_csv() ?

rows = <n>
nrows = <n>
n_rows = <n>
number_rows = <n>

Answer

nrows = <n>

Reason — The argument nrows = <n> is to be given in read_csv() to read the specified number of rows from the CSV file.

Question 8

To skip first 5 rows of CSV file, which argument will you give in read_csv() ?

skiprows = 5
skip_rows = 5
skip = 5
noread = 5

Answer

skiprows = 5

Reason — The skiprows = 5 argument in the read_csv() function is used to skip the first 5 rows of the CSV file and start reading from the 6th row onwards.

Question 9

To skip 1st, 3rd and 5th row of CSV file, which argument will you give in read_csv() ?

skiprows = 1 | 13 | 5
skiprows = [1, 5, 1]
skiprows = [1, 3, 5]
any of these

Answer

skiprows = [1, 3, 5]

Reason — The argument to skip 1st, 3rd and 5th row of CSV file in read_csv() is skiprows = [1, 3, 5].

Question 10

While reading from a CSV file, to use a column's values as index labels, argument given in read_CSV() is :

index
index_col
index_values
index_label

Answer

index_col

Reason — The argument given in read_csv() to use a column's values as index labels is index_col. By specifying index_col with the name of the desired column, pandas will use the values from that column as the index labels for the DataFrame.

Question 11

While writing a DataFrame onto a CSV file, which argument would you use in to_csv() for NaN values' representation as NULL ?

NaN = NULL
na_rep = NULL
na_value = NULL
na = NULL

Answer

na_rep = NULL

Reason — The na_rep argument in the to_csv() method is used to specify the representation of NaN values in the resulting CSV file. Using na_rep = NULL means that NaN values will be represented as "NULL" in the CSV file.

Question 12

Which of the following libraries let you establish a connection with a MySQL database from within Python ?

mysql.connector
Pymysql, sqlalchemy
pandas
numpy

Answer

mysql.connector, Pymysql, sqlalchemy

Reason — The libraries mysql.connector, PyMySQL, and SQLAlchemy all enable to establish connections with MySQL databases from within Python.

Question 13

In pandas.read_sql(<A>, ), <A> is

connection name
table name
SQL query string
database name

Answer

SQL query string

Reason — The <A> parameter in pandas.read_sql(<A>, ) represents the SQL query string used to fetch data from the database.

Question 14

In pandas.read_sql(<A>, ), is

connection name
table name
SQL query string
database name

Answer

connection name

Reason — The  parameter in pandas.read_sql(<A>, ) represents the connection object or the database connection.

Question 15

To suppress the creation of a column for index labels of a DataFrame, ............... argument is specified in to_sql().

if_exists = False
index = False
index = True
if_exists = True

Answer

index = False

Reason — The index = False argument in to_sql() is used to suppress the creation of a column for index labels of a DataFrame. This means that the index labels will not be included as a separate column in the database table.

Question 16

To append the content of DataFrame in a table of MySQL, ............... argument is used in to_sql().

if_exists = "Add"
if_exists = "append"
if_exists = Add
if_exists = append

Answer

if_exists = "append"

Reason — The if_exists = "append" argument in to_sql() is used to append the content of a DataFrame to an existing table in MySQL. This ensures that the data from the DataFrame is added to the specified table without overwriting or deleting any existing data.

Fill in the Blanks

Question 1

Full form of CSV is comma separated value.

Question 2

Default separator of CSV files is , (comma).

Question 3

To load data of a CSV file in a DataFrame read_csv function is used.

Question 4

To write data of a DataFrame in a CSV file, to_csv function is used.

Question 5

To specify a separator other than comma in a CSV file, sep argument is used.

Question 6

To specify the string to represent NaN values in a CSV file, na_rep argument in to_sql() is used.

Question 7

To load data in a DataFrame from mysql table, read_sql() function is used.

Question 8

To write data of a DataFrame in a mysql table, to_sql() function is used.

True/False Questions

Question 1

CSV files can only store comma separated values.

Answer

False

Reason — CSV files can store values separated by various delimiters such as semicolons, tabs, spaces, etc. The default delimiter is a comma.

Question 2

The number of rows in a DataFrame are by default equal to number of rows in a CSV file, it created from a CSV file.

Answer

True

Reason — When we create a DataFrame from a CSV file using pandas, by default, the number of rows in the DataFrame will be equal to the number of rows in the CSV file.

Question 3

Pandas can only read from CSV files but can't write CSV files.

Answer

False

Reason — Python's Pandas library provides two functions, read_csv() and to_csv(), which are used for reading data from a CSV file into a DataFrame and writing data from a DataFrame to a CSV file, respectively.

Question 4

You need to import CSV package in order to store a DataFrame in a CSV file.

Answer

False

Reason — In Python, we don't need to import the CSV package specifically to store a DataFrame in a CSV file when using the Pandas library. The DataFrame.to_csv() method is provided by Pandas itself and handles the conversion of DataFrame data to a CSV file format.

Question 5

The read_csv() can handle different separator characters but not to_csv().

Answer

False

Reason — Both the read_csv() and to_csv() functions in Pandas can handle different separator characters.

Question 6

In order to read from a MySQL table into a DataFrame, the table must exist.

Answer

True

Reason — In order to read data from a MySQL table into a DataFrame using pandas, the table must exist in the MySQL database. If the table doesn't exist, the read operation will fail because there would be no data to retrieve.

Question 7

The to_sql() function can append to a MySQL table.

Answer

True

Reason — The to_sql() function in pandas, when used with the if_exists='append' parameter, can append data to an existing MySQL table.

Question 8

The to_sql() function cannot create a MySQL table.

Answer

False

Reason — The to_sql() function in pandas, when used with the if_exists = 'replace' parameter, can create a MySQL table.

Question 9

The to_sql() function cannot work if the named table already exists.

Answer

False

Reason — The to_sql() function can work with existing tables in MySQL.

Question 10

The read_sql() can also read from CSV files.

Answer

False

Reason — The read_sql() function in pandas can read data from SQL databases using SQL queries. To read from CSV files, we use the read_csv() function in pandas.

Assertions and Reasons

Question 1

Assertion. Python Panda library offers functionality to interact with a CSV file.

Reason. Panda's read_csv() and to_csv() functions can read-from and write-to CSV files.

Both A and R are true and R is the correct explanation of A.
Both A and R are true but R is not the correct explanation of A.
A is true but R is false.
A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
Python's Pandas library offers functionality to interact with a CSV file. It provides two functions, read_csv() and to_csv(), which are used for reading data from a CSV file into a DataFrame and writing data from a DataFrame to a CSV file, respectively.

Question 2

Assertion. The read_csv() function of Python Pandas can read data of a csv file into any of pandas data structures.

Reason. DataFrame is a compatible data structure for read_csv() function.

Both A and R are true and R is the correct explanation of A.
Both A and R are true but R is not the correct explanation of A.
A is true but R is false.
A is false but R is true.

Answer

A is false but R is true.

Explanation
The read_csv() function of Python Pandas can only read data from a CSV file into the DataFrame data structure of Pandas. DataFrame is a compatible data structure for the read_csv() function, as it is the primary data structure designed to be created by read_csv().

Question 3

Assertion. The read_csv() function reads a csv file's data into a DataFrame.

Reason. The to_csv() function writes a DataFrame on to a csv file.

Both A and R are true and R is the correct explanation of A.
Both A and R are true but R is not the correct explanation of A.
A is true but R is false.
A is false but R is true.

Answer

Both A and R are true but R is not the correct explanation of A.

Explanation
The read_csv() function in Pandas is used to read data from a CSV file into a DataFrame. Conversely, the to_csv() function is used to write DataFrame data to a CSV file.

Question 4

Assertion. The read_sql() function of Pandas can query upon any mysql database.

Reason. The read_sql() function can query upon only those databases that have a connection established through mysql database connector.

Both A and R are true and R is the correct explanation of A.
Both A and R are true but R is not the correct explanation of A.
A is true but R is false.
A is false but R is true.

Answer

A is false but R is true.

Explanation
The read_sql() function is capable of querying data from MySQL databases, it is limited to databases for which a connection has been established through a MySQL database connector. Without this connection, the read_sql() function cannot query data from the MySQL database.

Question 5

Assertion. A DataFrame's data can be exported as a table of a mysql database.

Reason. Over an established connection to a mysql database, <DF>.to_sql() would write the data of the DataFrame <DF> as a table in the mysql database.

Both A and R are true and R is the correct explanation of A.
Both A and R are true but R is not the correct explanation of A.
A is true but R is false.
A is false but R is true.

Answer

Both A and R are true and R is the correct explanation of A.

Explanation
In Pandas, a DataFrame's data can be exported as a table of a mysql database. Over an established connection to a mysql database, the DF.to_sql() method is used to write the data of the DataFrame DF as a table in the mysql database.

Type A: Short Answer Questions/Conceptual Questions

Question 1

What are advantages of CSV file formats ?

Answer

The advantages of CSV file formats are as follows:

It is a simple, compact and ubiquitous format for data storage.
It is a common format for data interchange.
It can be opened in popular spreadsheet packages like MS-Excel, Calc etc.
Nearly all spreadsheets and databases support import/export to csv format.

Question 2

What all libraries do you require in order to bring data from a CSV file into a DataFrame ?

Answer

Python's Pandas library is required to bring data from a CSV file into a DataFrame.

Question 3

You want to read data from a CSV file in a DataFrame but you want to provide your own column names to DataFrame. What additional argument would you specify in read_csv() ?

Answer

To read data from a CSV file into a DataFrame while providing our own column names, we can use the names argument in the read_csv() function. The syntax is : <DF> = pandas.read_csv(<filepath>, names = <sequence containing column names>).

Question 4

By default, read_csv() uses the values of first row as column headers in DataFrames. Which argument will you give to ensure that the top/first row's data is used as data and not as column headers ?

Answer

To ensure that the top/first row's data is used as data and not as column headers in a DataFrame when using the read_csv() function, we need to use the header argument and set it to None. The syntax is : <DF> = pandas.read_csv(<filepath>, header = None).

Question 5

Which argument would you give to read_csv() if you only want to read top 10 rows of data ?

Answer

The nrows argument can be used to read only the top 10 rows of data from a CSV file using the read_csv() function in pandas. The nrows argument specifies the number of rows of the file to read. The syntax is : df = pandas.read_csv(<filepath>, nrows = 10).

Question 6

Write command to store data of DataFrame mdf into a CSV file Mydata.csv, with separator character as '@'.

Answer

mdf.to_csv("Mydata.csv", sep = "@")

Question 7

Why do you need connection to an SQL database in order to get data from a table ?

Answer

We need a connection to an SQL database in order to get data from a table because a database can be accessed by many programs simultaneously, and a connection represents a unique session with the database. This connection allows us to send SQL statements to the database and retrieve the results.

Question 8

What is pymysql library of Python ?

Answer

The pymysql library provides a Python interface for connecting to MySQL databases, which allows to execute SQL queries and retrieve data from the database into our Python program.

Question 9

What all libraries do you require in order to interact with MySQL databases (and DataFrame) from within Python ?

Answer

To interact with MySQL databases and DataFrames in Python, we need the pymysql and pandas libraries.

Question 10

What additional argument do you need to specify in to_sql() so that old data of MySQL table is retained ?

Answer

The if_exists argument in the pandas to_sql() function retains old data in a MySQL table. Using if_exists = append appends new data without deleting existing data.

Question 11

If query is a string storing an SQL statement. Write statements so that the data is fetched based on query from SQL database Mydata.

Answer

Let query store the following SQL statement:

query = "SELECT * FROM EMPLOYEE WHERE department = 'Human Resource'"

import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost",
                        user = "root", 
                        passwd = "MyPass", 
                        database = "Mydata")      
query = "SELECT * FROM EMPLOYEE WHERE department = 'Human Resource'"
mdf = pd.read_sql(query, mycon)
print(mdf)
mycon.close()

Type B: Application Based Questions

Question 1

Predict the output of following code fragments one by one. For every next code fragment, consider that the changes by previous code fragment are in place. That is, for code fragment (b), changes made by code fragment (a) are persisting ; for (c), changes by (a) and (b) are persisting and so on.

(a)

import pandas as pd
columns=['2015', '2016', '2017', '2018']
index=['Messi', 'Ronaldo', 'Neymar', 'Hazard']
df = pd.DataFrame(columns = columns, index = index)
print(df)
df.to_csv("c:\one.csv")

(b)

df['2015']['Messi'] = 12 
df['2016']['Ronaldo'] = 11 
df['2017']['Neymar'] = 8 
df['2018']['Hazard'] = 16 
print(df)
df.to_csv("c:\\two.csv", sep = '@')

(c)

new_df = pd.read_csv('c:\one.csv', index_col = 0) 
print(new_df)

(d)

new_df = pd.read_csv('c:\one.csv') 
print(new_df)

(e)

new_df = pd.read_csv('c:\\two.csv') 
print(new_df)

(f)

new_df = pd.read_csv('c:\\two.csv', sep = '@')
print(new_df)

Answer

(a)

Output

        2015 2016 2017 2018
Messi    NaN  NaN  NaN  NaN
Ronaldo  NaN  NaN  NaN  NaN
Neymar   NaN  NaN  NaN  NaN
Hazard   NaN  NaN  NaN  NaN

Explanation

The DataFrame df is created with defined columns and index labels. It is then printed, showing NaN values since no data is provided. Afterwards, df is written to a CSV file named one.csv.

The contents of the file one.csv are as follows:

,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,

(b)

Output

        2015 2016 2017 2018
Messi     12  NaN  NaN  NaN
Ronaldo  NaN   11  NaN  NaN
Neymar   NaN  NaN    8  NaN
Hazard   NaN  NaN  NaN   16

Explanation

The code assigns values to specific cells in the DataFrame df, such as setting the value of 'Messi' in the '2015' column to 12 and so on. After printing the modified DataFrame, it is saved to a CSV file named two.csv using '@' as the separator.

The contents of the file two.csv are as follows:

@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16

(c)

Output

         2015  2016  2017  2018
Messi     NaN   NaN   NaN   NaN
Ronaldo   NaN   NaN   NaN   NaN
Neymar    NaN   NaN   NaN   NaN
Hazard    NaN   NaN   NaN   NaN

Explanation

The code reads the CSV file named one.csv into a new DataFrame new_df. The parameter index_col = 0 means the first column of the CSV file will be used as the index. It then prints the contents of new_df. Since the CSV file has an empty DataFrame, the new_df DataFrame will be empty as well.

The contents of the file one.csv are as follows:

,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,

(d)

Output

  Unnamed: 0  2015  2016  2017  2018
0      Messi   NaN   NaN   NaN   NaN
1    Ronaldo   NaN   NaN   NaN   NaN
2     Neymar   NaN   NaN   NaN   NaN
3     Hazard   NaN   NaN   NaN   NaN

Explanation

The code used the pandas read_csv() function to extract data from one.csv file. The retrieved data is then stored in a DataFrame called new_df and then it is printed. Since the index_col parameter is not specified, the DataFrame is using default numerical indexes. The DataFrame displays "Unnamed: 0" as the header for the first column because the CSV file doesn't have a header row.

The contents of the file one.csv are as follows:

,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,

(e)

Output

  @2015@2016@2017@2018
0          Messi@12@@@
1        Ronaldo@@11@@
2          Neymar@@@8@
3         Hazard@@@@16

Explanation

The code used the pandas read_csv() function to extract data from the two.csv file. The retrieved data is then stored in a DataFrame called new_df and printed. However, if the actual separator in the two.csv file is "@" instead of the default comma (,), then the code without specifying the sep parameter, will not parse the data correctly. It will assume that the data is comma-separated and will not display the correct structure of the data in the DataFrame.

The contents of the file two.csv are as follows:

@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16

(f)

Output

  Unnamed: 0  2015  2016  2017  2018
0      Messi  12.0   NaN   NaN   NaN
1    Ronaldo   NaN  11.0   NaN   NaN
2     Neymar   NaN   NaN   8.0   NaN
3     Hazard   NaN   NaN   NaN  16.0

Explanation

The code extracts data from the two.csv file using the read_csv() function and assigns it to the new_df DataFrame. It then prints new_df. The sep parameter is used to ensure that the DataFrame is correctly parsed, and default indexes are used since index_col is not specified. The term 'Unnamed' is used because a header for the first column is not mentioned in the CSV file.

The contents of the file two.csv are as follows:

@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16

Question 2

Are the following two statements same ? Why/Why not ?

(i) pd.read_csv('zoo.csv', sep = ',')

(ii) pd.read_csv('zoo.csv')

Answer

Yes, the two statements are same. The reason is that when we don't explicitly specify the sep parameter in pd.read_csv(), pandas assumes the default separator to be a comma (,). So, both statements are telling pandas to read the CSV file "zoo.csv" with comma-separated values.

Question 3

How are following two codes similar or different ? What output will they produce ?

(i)

df = pd.read_csv("data.csv", nrows = 5) 
print(df)

(ii)

df = pd.read_csv("data.csv")
print(df)

Answer

The two codes are similar in that they both use pd.read_csv() to read a CSV file named 'data.csv' into a pandas DataFrame df. However, they differ in their usage of the nrows parameter. The first code uses the nrows parameter with a value of 5, indicating that it reads only the first 5 rows of the CSV file. On the other hand, the second code does not have the nrows parameter, so it reads the entire CSV file.

For code (i), the output will be a DataFrame containing the first 5 rows of the 'data.csv' file. For code (ii), the output will be a DataFrame containing all the rows of the 'data.csv' file.

Question 4

Write Python statement to export the DataFrame to a CSV file named data.csv stored at D: drive.

Answer

DataFrame.to_csv('D:\\data.csv')

Question 5

What is the difference between following two statements ?

(i)

df.to_sql('houses', con = conn, if_exists = 'replace')

(ii)

df.to_sql('houses', con = conn, if_exists = 'replace', index = False)

Answer

The difference between the two statements is whether the DataFrame's index is included as a separate column in the resulting SQL table. By default, when we use to_sql() without specifying the index parameter, index = True is assumed, meaning that the DataFrame's index will be included in the SQL table, as in the first statement. Setting index = False explicitly excludes the DataFrame's index from being saved as a separate column in the SQL table, as in the second statement.

Question 6

Consider following code when conn is the name of established connection to MySQL database.

Cars = {'Brand': ['Alto', 'Zen', 'City', 'Kia'],
        'Price': [22000, 25000, 27000, 35000]}
df = pd.DataFrame(Cars, columns= ['Brand', 'Price']) 
df.to_sql('CARS', conn, if_exists = 'replace', index = False)

What will be the output of following query if executed on MySQL ?

SELECT * from CARS ;

Answer

Output

+-------+-------+
| Brand | Price |
+-------+-------+
| Alto  | 22000 |
| Zen   | 25000 |
| City  | 27000 |
| Kia   | 35000 |
+-------+-------+

Explanation

The code initializes a DataFrame df. It then writes this data to an SQL database table named 'CARS' using to_sql(). The SQL query SELECT * FROM CARS; retrieves all columns and rows from the 'CARS' table.

Question 7

Consider following code when conn is the name of established connection to MySQL database.

sql = SELECT * from Sales where zone = "central"; 
df = pandas.read_sql(sql, conn)
df.head()

What will be stored in df ?

Answer

The DataFrame df includes all the columns where 'zone' equals 'central'.

Explanation

The code executes an SQL query to select all columns (*) from the "Sales" table where the "zone" column equals to "central". It then reads the results of the query into a pandas DataFrame df using pandas.read_sql(). Then, it returns the first five rows of df using df.head().

Type C: Programming Practice/Knowledge Based Questions

Question 1

Write a program to read details such as Item, Sales made in a DataFrame and then store this data in a CSV file.

Solution

import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange', 'Grapes'],
        'Sales': [100, 150, 80, 120]}
df = pd.DataFrame(data)
df.to_csv('one.csv', index = False)

Output

Item,Sales
Apple,100
Banana,150
Orange,80
Grapes,120

Question 2

Write a program to read data from a CSV file where separator character is '@'. Make sure that :

the top row is used as data, not as column headers.
only 20 rows are read into DataFrame.

Answer

Let the contents of the file bike.csv be the following:

Brand@Price
Honda@2500
Yamaha@2800
Suzuki@2100
Kawasaki@3200
Ducati@3500
BMW@4500
Harley-Davidson@5500
KTM@4000
Triumph@5300
Aprilia@4800
Indian@5700
Royal Enfield@3000
MV Agusta@4200
Moto Guzzi@4900
Victory@5600
Benelli@3900
Husqvarna@4800
Zero Motorcycles@6500
Energica@7500
Norton@5900

The program to read data from a CSV file is as follows:

import pandas as pd
d = pd.read_csv('one.csv', sep = '@', header = None, nrows = 20)
print(d)

Output

                   0      1
0              Brand  Price
1              Honda   2500
2             Yamaha   2800
3             Suzuki   2100
4           Kawasaki   3200
5             Ducati   3500
6                BMW   4500
7    Harley-Davidson   5500
8                KTM   4000
9            Triumph   5300
10           Aprilia   4800
11            Indian   5700
12     Royal Enfield   3000
13         MV Agusta   4200
14        Moto Guzzi   4900
15           Victory   5600
16           Benelli   3900
17         Husqvarna   4800
18  Zero Motorcycles   6500
19          Energica   7500

Question 3

Write a program to get following data in two DataFrames :

df 1:

Roll no	Name
1	ABC
2	DEF

df2:

Roll no	Marks 1	Marks 2	Marks 3
1	70	80	75
2	60	65	70

Store these DataFrames as two separate tables in the same database.

Solution

import pandas as pd
from sqlalchemy import create_engine
import pymysql
data1 = {'Roll no': [1, 2], 'Name': ['ABC', 'DEF']}
data2 = {'Roll no': [1, 2], 'Marks1' : [70, 60], 'Marks2': [80, 65],'Marks3' : [75, 70]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
engine = create_engine('mysql+pymysql://root:Mypass@localhost/database')
mycon = engine.connect()
df1.to_sql('first', mycon, if_exists = 'replace', index = False)
df2.to_sql('second', mycon, if_exists = 'replace', index = False)

Output

First table:

+---------+------+
| Roll no | Name |
+---------+------+
|       1 | ABC  |
|       2 | DEF  |
+---------+------+

Second table:

+---------+--------+--------+--------+
| Roll no | Marks1 | Marks2 | Marks3 |
+---------+--------+--------+--------+
|       1 |     70 |     80 |     75 |
|       2 |     60 |     65 |     70 |
+---------+--------+--------+--------+

Question 4

You have a database on MySQL namely school having three tables in it — Student, Subject, Teacher. Write a program to store these tables in three DataFrames.

Solution

import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost",
                       user = "root",
                       passwd = "mypass",
                       database = "school")
mdf = pd.read_sql("SELECT * FROM student;", mycon)
ndf = pd.read_sql("SELECT * FROM subject;", mycon)
ldf = pd.read_sql("SELECT * FROM teacher;", mycon)
print("Student dataframe:\n")
print(mdf)
print("Subject dataframe:\n")
print(ndf)
print("Teacher dataframe:\n")
print(ldf)

Output

Student dataframe:

   NO     NAME  AGE DEPARTMENT  FEE SEX
0   1   PANKAJ   24   COMPUTER  120   M
1   2  SHALINI   21    HISTORY  200   F
2   3   SANJAY   22      HINDI  300   M
3   4    SUDHA   25    HISTORY  400   F
4   5   ANANYA   23   COMPUTER  450   F

Subject dataframe:

   Subject_id     name  Teacher_id
0         101     math         201
1         102  science         202
2         103  english         203
3         104    hindi         204

Teacher dataframe:

   No      Name  Age Department  Salary Sex  Dateofjoin
0   1     Jugal   34   Computer   12000   M  1997-01-10
1   2  Sharmila   31    History   20000   F  1998-03-24
2   3   Sandeep   32      Maths   30000   M  1996-12-12
3   4  Sangeeta   35    History   40000   F  1999-07-01
4   5    Rakesh   42      Maths   25000   M  1997-09-05
5   6     Shyam   50    History   30000   M  1998-06-27
6   7   Shiv Om   44   Computer   21000   M  1997-02-25
7   8  Shalakha   33      Maths   20000   F  1997-07-31

Question 5

The DataFrame SDF stores the sales records of 100 salesmen. Write a program to store this as a table in database namely "company" on MySQL.

Solution

import pandas as pd
from sqlalchemy import create_engine
import pymysql

def sales(sdf):
    engine = create_engine('mysql+pymysql://root:Mypass@localhost/company')
    mycon = engine.connect()
    sdf.to_sql('Records', mycon, if_exists = 'replace', index = False)

Question 6

Consider the SDF DataFrame storing the sales records of 100 salesmen. Write a program that stores only the first 25 rows of the DataFrame in a table on MySQL database.

Solution

import pandas as pd
from sqlalchemy import create_engine
import pymysql

def sales(sdf):
    engine = create_engine('mysql+pymysql://root:Mypass@localhost/company')
    mycon = engine.connect()
    first_25_rows = sdf.head(25)
    first_25_rows.to_sql('Records', mycon, if_exists = 'replace', index = False)

Question 7

The sales table of company database of MySQL stores the sales records of 100 salesmen. Write a program to load only those records in DataFrame which have made sales more than of 50000/-.

Solution

import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost", user = "root", passwd = "mypass", database = "company")
df = pd.read_sql("SELECT * FROM sales WHERE Sales > 50000;", mycon)
print(df)