Importing/Exporting Data between CSV Files/MySQL and Pandas
What is CSV file ? Is it a text file ?
Answer
The acronym CSV is short for Comma-Separated Values, which refers to a tabular data saved as plaintext where data values are separated by commas.
Name the function you would use to read data from a CSV file into a DataFrame.
Answer
The function used to read data from a CSV file into a DataFrame is read_csv()
.
Name the function you would use to store data from a DataFrame into a CSV file.
Answer
The function used to store data from a DataFrame into a CSV file is to_csv()
.
Which function lets you fetch data from an SQL table into a DataFrame ?
Answer
The function used to fetch data from an SQL table into a DataFrame is read_sql()
.
Which function lets you store data of a DataFrame into an SQL table ?
Answer
The function used to store data of a DataFrame into an SQL table is to_sql()
.
Which additional argument would you give to append the data of a DataFrame in an existing SQL table ?
Answer
The if_exists = 'append'
argument in the to_sql()
function of pandas is used to append the data of a DataFrame into the existing SQL table.
CSV stands for :
- Column Separated Value
- Class Separated Value
- Comma Separated Value
- None of the above
Answer
Comma Separated Value
Reason — The acronym CSV is short for Comma-Separated Value.
A CSV file can take ............... character as separator.
- ,
- ~
- |
- \t
- only (a)
- all of these
Answer
all of these
Reason — All of the listed characters can be used as a separator in a CSV file, the default separator is comma.
In order to work with CSV files from panda, you need to import ............... , other than pandas.
- .csv
- pandas.io
- newcsv
- no extra package required
Answer
no extra package required
Reason — Python's Pandas library offers two functions, read_csv()
and to_csv()
, that help bring data from a CSV file into a DataFrame and write a DataFrame's data to a CSV file. Therefore, we do not need to import any additional packages or modules other than pandas.
The correct statement to read from a CSV file in a DataFrame is :
- <DF>.read_csv(<file>)
- <File>. read_csv( )(<DF>)
- <DF> = pandas.read(<file>)
- <DF> = pandas.read_csv(<files>)
Answer
<DF> = pandas.read_csv(<files>)
Reason — The statement to read data from a CSV file into a DataFrame is <DF> = pandas.read_csv(<filepath>)
.
Which argument do you specify with read_csv() to specify a separator character ?
- character
- char
- separator
- sep
Answer
sep
Reason — The sep
argument in the read_csv()
function is used to specify the separator character. By default, this is set to a comma (,).
To suppress first row as header, which of the following arguments is to be given in read_csv() ?
- noheader = True
- header = None
- skipheader = True
- header = Null
Answer
header = None
Reason — The header = None
argument is to be given in read_csv
to suppress the first row as the header.
To read specific number of rows from a CSV file, which argument is to be given in read_csv() ?
- rows = <n>
- nrows = <n>
- n_rows = <n>
- number_rows = <n>
Answer
nrows = <n>
Reason — The argument nrows = <n>
is to be given in read_csv()
to read the specified number of rows from the CSV file.
To skip first 5 rows of CSV file, which argument will you give in read_csv() ?
- skiprows = 5
- skip_rows = 5
- skip = 5
- noread = 5
Answer
skiprows = 5
Reason — The skiprows = 5
argument in the read_csv()
function is used to skip the first 5 rows of the CSV file and start reading from the 6th row onwards.
To skip 1st, 3rd and 5th row of CSV file, which argument will you give in read_csv() ?
- skiprows = 1 | 13 | 5
- skiprows = [1, 5, 1]
- skiprows = [1, 3, 5]
- any of these
Answer
skiprows = [1, 3, 5]
Reason — The argument to skip 1st, 3rd and 5th row of CSV file in read_csv()
is skiprows = [1, 3, 5]
.
While reading from a CSV file, to use a column's values as index labels, argument given in read_CSV() is :
- index
- index_col
- index_values
- index_label
Answer
index_col
Reason — The argument given in read_csv()
to use a column's values as index labels is index_col
. By specifying index_col
with the name of the desired column, pandas will use the values from that column as the index labels for the DataFrame.
While writing a DataFrame onto a CSV file, which argument would you use in to_csv() for NaN values' representation as NULL ?
- NaN = NULL
- na_rep = NULL
- na_value = NULL
- na = NULL
Answer
na_rep = NULL
Reason — The na_rep
argument in the to_csv()
method is used to specify the representation of NaN values in the resulting CSV file. Using na_rep = NULL
means that NaN values will be represented as "NULL" in the CSV file.
Which of the following libraries let you establish a connection with a MySQL database from within Python ?
- mysql.connector
- Pymysql, sqlalchemy
- pandas
- numpy
Answer
mysql.connector, Pymysql, sqlalchemy
Reason — The libraries mysql.connector, PyMySQL, and SQLAlchemy all enable to establish connections with MySQL databases from within Python.
In pandas.read_sql(<A>, <B>), <A> is
- connection name
- table name
- SQL query string
- database name
Answer
SQL query string
Reason — The <A>
parameter in pandas.read_sql(<A>, <B>)
represents the SQL query string used to fetch data from the database.
In pandas.read_sql(<A>, <B>), <B> is
- connection name
- table name
- SQL query string
- database name
Answer
connection name
Reason — The <B>
parameter in pandas.read_sql(<A>, <B>)
represents the connection object or the database connection.
To suppress the creation of a column for index labels of a DataFrame, ............... argument is specified in to_sql().
- if_exists = False
- index = False
- index = True
- if_exists = True
Answer
index = False
Reason — The index = False
argument in to_sql()
is used to suppress the creation of a column for index labels of a DataFrame. This means that the index labels will not be included as a separate column in the database table.
To append the content of DataFrame in a table of MySQL, ............... argument is used in to_sql().
- if_exists = "Add"
- if_exists = "append"
- if_exists = Add
- if_exists = append
Answer
if_exists = "append"
Reason — The if_exists = "append"
argument in to_sql()
is used to append the content of a DataFrame to an existing table in MySQL. This ensures that the data from the DataFrame is added to the specified table without overwriting or deleting any existing data.
Full form of CSV is comma separated value.
Default separator of CSV files is , (comma).
To load data of a CSV file in a DataFrame read_csv function is used.
To write data of a DataFrame in a CSV file, to_csv function is used.
To specify a separator other than comma in a CSV file, sep argument is used.
To specify the string to represent NaN values in a CSV file, na_rep argument in to_sql() is used.
To load data in a DataFrame from mysql table, read_sql() function is used.
To write data of a DataFrame in a mysql table, to_sql() function is used.
CSV files can only store comma separated values.
Answer
False
Reason — CSV files can store values separated by various delimiters such as semicolons, tabs, spaces, etc. The default delimiter is a comma.
The number of rows in a DataFrame are by default equal to number of rows in a CSV file, it created from a CSV file.
Answer
True
Reason — When we create a DataFrame from a CSV file using pandas, by default, the number of rows in the DataFrame will be equal to the number of rows in the CSV file.
Pandas can only read from CSV files but can't write CSV files.
Answer
False
Reason — Python's Pandas library provides two functions, read_csv()
and to_csv()
, which are used for reading data from a CSV file into a DataFrame and writing data from a DataFrame to a CSV file, respectively.
You need to import CSV package in order to store a DataFrame in a CSV file.
Answer
False
Reason — In Python, we don't need to import the CSV package specifically to store a DataFrame in a CSV file when using the Pandas library. The DataFrame.to_csv()
method is provided by Pandas itself and handles the conversion of DataFrame data to a CSV file format.
The read_csv() can handle different separator characters but not to_csv().
Answer
False
Reason — Both the read_csv()
and to_csv()
functions in Pandas can handle different separator characters.
In order to read from a MySQL table into a DataFrame, the table must exist.
Answer
True
Reason — In order to read data from a MySQL table into a DataFrame using pandas, the table must exist in the MySQL database. If the table doesn't exist, the read operation will fail because there would be no data to retrieve.
The to_sql() function can append to a MySQL table.
Answer
True
Reason — The to_sql()
function in pandas, when used with the if_exists='append'
parameter, can append data to an existing MySQL table.
The to_sql() function cannot create a MySQL table.
Answer
False
Reason — The to_sql()
function in pandas, when used with the if_exists = 'replace'
parameter, can create a MySQL table.
The to_sql() function cannot work if the named table already exists.
Answer
False
Reason — The to_sql()
function can work with existing tables in MySQL.
The read_sql() can also read from CSV files.
Answer
False
Reason — The read_sql()
function in pandas can read data from SQL databases using SQL queries. To read from CSV files, we use the read_csv()
function in pandas.
Assertion. Python Panda library offers functionality to interact with a CSV file.
Reason. Panda's read_csv() and to_csv() functions can read-from and write-to CSV files.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true and R is the correct explanation of A.
Explanation
Python's Pandas library offers functionality to interact with a CSV file. It provides two functions, read_csv()
and to_csv()
, which are used for reading data from a CSV file into a DataFrame and writing data from a DataFrame to a CSV file, respectively.
Assertion. The read_csv() function of Python Pandas can read data of a csv file into any of pandas data structures.
Reason. DataFrame is a compatible data structure for read_csv() function.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
A is false but R is true.
Explanation
The read_csv()
function of Python Pandas can only read data from a CSV file into the DataFrame data structure of Pandas. DataFrame is a compatible data structure for the read_csv()
function, as it is the primary data structure designed to be created by read_csv()
.
Assertion. The read_csv() function reads a csv file's data into a DataFrame.
Reason. The to_csv() function writes a DataFrame on to a csv file.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true but R is not the correct explanation of A.
Explanation
The read_csv()
function in Pandas is used to read data from a CSV file into a DataFrame. Conversely, the to_csv()
function is used to write DataFrame data to a CSV file.
Assertion. The read_sql() function of Pandas can query upon any mysql database.
Reason. The read_sql() function can query upon only those databases that have a connection established through mysql database connector.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
A is false but R is true.
Explanation
The read_sql()
function is capable of querying data from MySQL databases, it is limited to databases for which a connection has been established through a MySQL database connector. Without this connection, the read_sql()
function cannot query data from the MySQL database.
Assertion. A DataFrame's data can be exported as a table of a mysql database.
Reason. Over an established connection to a mysql database, <DF>.to_sql() would write the data of the DataFrame <DF> as a table in the mysql database.
- Both A and R are true and R is the correct explanation of A.
- Both A and R are true but R is not the correct explanation of A.
- A is true but R is false.
- A is false but R is true.
Answer
Both A and R are true and R is the correct explanation of A.
Explanation
In Pandas, a DataFrame's data can be exported as a table of a mysql database. Over an established connection to a mysql database, the DF.to_sql()
method is used to write the data of the DataFrame DF
as a table in the mysql database.
What are advantages of CSV file formats ?
Answer
The advantages of CSV file formats are as follows:
It is a simple, compact and ubiquitous format for data storage.
It is a common format for data interchange.
It can be opened in popular spreadsheet packages like MS-Excel, Calc etc.
Nearly all spreadsheets and databases support import/export to csv format.
What all libraries do you require in order to bring data from a CSV file into a DataFrame ?
Answer
Python's Pandas library is required to bring data from a CSV file into a DataFrame.
You want to read data from a CSV file in a DataFrame but you want to provide your own column names to DataFrame. What additional argument would you specify in read_csv() ?
Answer
To read data from a CSV file into a DataFrame while providing our own column names, we can use the names
argument in the read_csv()
function. The syntax is : <DF> = pandas.read_csv(<filepath>, names = <sequence containing column names>)
.
By default, read_csv() uses the values of first row as column headers in DataFrames. Which argument will you give to ensure that the top/first row's data is used as data and not as column headers ?
Answer
To ensure that the top/first row's data is used as data and not as column headers in a DataFrame when using the read_csv()
function, we need to use the header
argument and set it to None. The syntax is : <DF> = pandas.read_csv(<filepath>, header = None)
.
Which argument would you give to read_csv() if you only want to read top 10 rows of data ?
Answer
The nrows
argument can be used to read only the top 10 rows of data from a CSV file using the read_csv()
function in pandas. The nrows
argument specifies the number of rows of the file to read. The syntax is : df = pandas.read_csv(<filepath>, nrows = 10)
.
Write command to store data of DataFrame mdf into a CSV file Mydata.csv, with separator character as '@'.
Answer
mdf.to_csv("Mydata.csv", sep = "@")
Why do you need connection to an SQL database in order to get data from a table ?
Answer
We need a connection to an SQL database in order to get data from a table because a database can be accessed by many programs simultaneously, and a connection represents a unique session with the database. This connection allows us to send SQL statements to the database and retrieve the results.
What is pymysql library of Python ?
Answer
The pymysql library provides a Python interface for connecting to MySQL databases, which allows to execute SQL queries and retrieve data from the database into our Python program.
What all libraries do you require in order to interact with MySQL databases (and DataFrame) from within Python ?
Answer
To interact with MySQL databases and DataFrames in Python, we need the pymysql and pandas libraries.
What additional argument do you need to specify in to_sql() so that old data of MySQL table is retained ?
Answer
The if_exists
argument in the pandas to_sql()
function retains old data in a MySQL table. Using if_exists = append
appends new data without deleting existing data.
If query is a string storing an SQL statement. Write statements so that the data is fetched based on query from SQL database Mydata.
Answer
Let query
store the following SQL statement:
query = "SELECT * FROM EMPLOYEE WHERE department = 'Human Resource'"
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost",
user = "root",
passwd = "MyPass",
database = "Mydata")
query = "SELECT * FROM EMPLOYEE WHERE department = 'Human Resource'"
mdf = pd.read_sql(query, mycon)
print(mdf)
mycon.close()
Predict the output of following code fragments one by one. For every next code fragment, consider that the changes by previous code fragment are in place. That is, for code fragment (b), changes made by code fragment (a) are persisting ; for (c), changes by (a) and (b) are persisting and so on.
(a)
import pandas as pd
columns=['2015', '2016', '2017', '2018']
index=['Messi', 'Ronaldo', 'Neymar', 'Hazard']
df = pd.DataFrame(columns = columns, index = index)
print(df)
df.to_csv("c:\one.csv")
(b)
df['2015']['Messi'] = 12
df['2016']['Ronaldo'] = 11
df['2017']['Neymar'] = 8
df['2018']['Hazard'] = 16
print(df)
df.to_csv("c:\\two.csv", sep = '@')
(c)
new_df = pd.read_csv('c:\one.csv', index_col = 0)
print(new_df)
(d)
new_df = pd.read_csv('c:\one.csv')
print(new_df)
(e)
new_df = pd.read_csv('c:\\two.csv')
print(new_df)
(f)
new_df = pd.read_csv('c:\\two.csv', sep = '@')
print(new_df)
Answer
(a)
2015 2016 2017 2018
Messi NaN NaN NaN NaN
Ronaldo NaN NaN NaN NaN
Neymar NaN NaN NaN NaN
Hazard NaN NaN NaN NaN
The DataFrame df
is created with defined columns and index labels. It is then printed, showing NaN values since no data is provided. Afterwards, df
is written to a CSV file named one.csv
.
The contents of the file one.csv
are as follows:
,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,
(b)
2015 2016 2017 2018
Messi 12 NaN NaN NaN
Ronaldo NaN 11 NaN NaN
Neymar NaN NaN 8 NaN
Hazard NaN NaN NaN 16
The code assigns values to specific cells in the DataFrame df
, such as setting the value of 'Messi' in the '2015' column to 12 and so on. After printing the modified DataFrame, it is saved to a CSV file named two.csv
using '@' as the separator.
The contents of the file two.csv
are as follows:
@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16
(c)
2015 2016 2017 2018
Messi NaN NaN NaN NaN
Ronaldo NaN NaN NaN NaN
Neymar NaN NaN NaN NaN
Hazard NaN NaN NaN NaN
The code reads the CSV file named one.csv
into a new DataFrame new_df
. The parameter index_col = 0
means the first column of the CSV file will be used as the index. It then prints the contents of new_df
. Since the CSV file has an empty DataFrame, the new_df
DataFrame will be empty as well.
The contents of the file one.csv
are as follows:
,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,
(d)
Unnamed: 0 2015 2016 2017 2018
0 Messi NaN NaN NaN NaN
1 Ronaldo NaN NaN NaN NaN
2 Neymar NaN NaN NaN NaN
3 Hazard NaN NaN NaN NaN
The code used the pandas read_csv()
function to extract data from one.csv
file. The retrieved data is then stored in a DataFrame called new_df
and then it is printed. Since the index_col
parameter is not specified, the DataFrame is using default numerical indexes. The DataFrame displays "Unnamed: 0" as the header for the first column because the CSV file doesn't have a header row.
The contents of the file one.csv
are as follows:
,2015,2016,2017,2018
Messi,,,,
Ronaldo,,,,
Neymar,,,,
Hazard,,,,
(e)
@2015@2016@2017@2018
0 Messi@12@@@
1 Ronaldo@@11@@
2 Neymar@@@8@
3 Hazard@@@@16
The code used the pandas read_csv()
function to extract data from the two.csv
file. The retrieved data is then stored in a DataFrame called new_df
and printed. However, if the actual separator in the two.csv
file is "@" instead of the default comma (,), then the code without specifying the sep
parameter, will not parse the data correctly. It will assume that the data is comma-separated and will not display the correct structure of the data in the DataFrame.
The contents of the file two.csv
are as follows:
@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16
(f)
Unnamed: 0 2015 2016 2017 2018
0 Messi 12.0 NaN NaN NaN
1 Ronaldo NaN 11.0 NaN NaN
2 Neymar NaN NaN 8.0 NaN
3 Hazard NaN NaN NaN 16.0
The code extracts data from the two.csv
file using the read_csv()
function and assigns it to the new_df
DataFrame. It then prints new_df
. The sep
parameter is used to ensure that the DataFrame is correctly parsed, and default indexes are used since index_col
is not specified. The term 'Unnamed' is used because a header for the first column is not mentioned in the CSV file.
The contents of the file two.csv
are as follows:
@2015@2016@2017@2018
Messi@12@@@
Ronaldo@@11@@
Neymar@@@8@
Hazard@@@@16
Are the following two statements same ? Why/Why not ?
(i) pd.read_csv('zoo.csv', sep = ',')
(ii) pd.read_csv('zoo.csv')
Answer
Yes, the two statements are same. The reason is that when we don't explicitly specify the sep
parameter in pd.read_csv()
, pandas assumes the default separator to be a comma (,). So, both statements are telling pandas to read the CSV file "zoo.csv" with comma-separated values.
How are following two codes similar or different ? What output will they produce ?
(i)
df = pd.read_csv("data.csv", nrows = 5)
print(df)
(ii)
df = pd.read_csv("data.csv")
print(df)
Answer
The two codes are similar in that they both use pd.read_csv()
to read a CSV file named 'data.csv' into a pandas DataFrame df
. However, they differ in their usage of the nrows
parameter. The first code uses the nrows
parameter with a value of 5, indicating that it reads only the first 5 rows of the CSV file. On the other hand, the second code does not have the nrows
parameter, so it reads the entire CSV file.
For code (i), the output will be a DataFrame containing the first 5 rows of the 'data.csv' file. For code (ii), the output will be a DataFrame containing all the rows of the 'data.csv' file.
Write Python statement to export the DataFrame to a CSV file named data.csv stored at D: drive.
Answer
DataFrame.to_csv('D:\\data.csv')
What is the difference between following two statements ?
(i)
df.to_sql('houses', con = conn, if_exists = 'replace')
(ii)
df.to_sql('houses', con = conn, if_exists = 'replace', index = False)
Answer
The difference between the two statements is whether the DataFrame's index is included as a separate column in the resulting SQL table. By default, when we use to_sql()
without specifying the index
parameter, index = True
is assumed, meaning that the DataFrame's index will be included in the SQL table, as in the first statement. Setting index = False
explicitly excludes the DataFrame's index from being saved as a separate column in the SQL table, as in the second statement.
Consider following code when conn is the name of established connection to MySQL database.
Cars = {'Brand': ['Alto', 'Zen', 'City', 'Kia'],
'Price': [22000, 25000, 27000, 35000]}
df = pd.DataFrame(Cars, columns= ['Brand', 'Price'])
df.to_sql('CARS', conn, if_exists = 'replace', index = False)
What will be the output of following query if executed on MySQL ?
SELECT * from CARS ;
Answer
+-------+-------+
| Brand | Price |
+-------+-------+
| Alto | 22000 |
| Zen | 25000 |
| City | 27000 |
| Kia | 35000 |
+-------+-------+
The code initializes a DataFrame df
. It then writes this data to an SQL database table named 'CARS' using to_sql()
. The SQL query SELECT * FROM CARS;
retrieves all columns and rows from the 'CARS' table.
Consider following code when conn is the name of established connection to MySQL database.
sql = SELECT * from Sales where zone = "central";
df = pandas.read_sql(sql, conn)
df.head()
What will be stored in df ?
Answer
The DataFrame df
includes all the columns where 'zone' equals 'central'.
The code executes an SQL query to select all columns (*) from the "Sales" table where the "zone" column equals to "central". It then reads the results of the query into a pandas DataFrame df
using pandas.read_sql()
. Then, it returns the first five rows of df
using df.head()
.
Write a program to read details such as Item, Sales made in a DataFrame and then store this data in a CSV file.
import pandas as pd
data = {'Item': ['Apple', 'Banana', 'Orange', 'Grapes'],
'Sales': [100, 150, 80, 120]}
df = pd.DataFrame(data)
df.to_csv('one.csv', index = False)
Item,Sales
Apple,100
Banana,150
Orange,80
Grapes,120
Write a program to read data from a CSV file where separator character is '@'. Make sure that :
- the top row is used as data, not as column headers.
- only 20 rows are read into DataFrame.
Answer
Let the contents of the file bike.csv
be the following:
Brand@Price
Honda@2500
Yamaha@2800
Suzuki@2100
Kawasaki@3200
Ducati@3500
BMW@4500
Harley-Davidson@5500
KTM@4000
Triumph@5300
Aprilia@4800
Indian@5700
Royal Enfield@3000
MV Agusta@4200
Moto Guzzi@4900
Victory@5600
Benelli@3900
Husqvarna@4800
Zero Motorcycles@6500
Energica@7500
Norton@5900
The program to read data from a CSV file is as follows:
import pandas as pd
d = pd.read_csv('one.csv', sep = '@', header = None, nrows = 20)
print(d)
0 1
0 Brand Price
1 Honda 2500
2 Yamaha 2800
3 Suzuki 2100
4 Kawasaki 3200
5 Ducati 3500
6 BMW 4500
7 Harley-Davidson 5500
8 KTM 4000
9 Triumph 5300
10 Aprilia 4800
11 Indian 5700
12 Royal Enfield 3000
13 MV Agusta 4200
14 Moto Guzzi 4900
15 Victory 5600
16 Benelli 3900
17 Husqvarna 4800
18 Zero Motorcycles 6500
19 Energica 7500
Write a program to get following data in two DataFrames :
df 1:
Roll no | Name |
---|---|
1 | ABC |
2 | DEF |
df2:
Roll no | Marks 1 | Marks 2 | Marks 3 |
---|---|---|---|
1 | 70 | 80 | 75 |
2 | 60 | 65 | 70 |
Store these DataFrames as two separate tables in the same database.
import pandas as pd
from sqlalchemy import create_engine
import pymysql
data1 = {'Roll no': [1, 2], 'Name': ['ABC', 'DEF']}
data2 = {'Roll no': [1, 2], 'Marks1' : [70, 60], 'Marks2': [80, 65],'Marks3' : [75, 70]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
engine = create_engine('mysql+pymysql://root:Mypass@localhost/database')
mycon = engine.connect()
df1.to_sql('first', mycon, if_exists = 'replace', index = False)
df2.to_sql('second', mycon, if_exists = 'replace', index = False)
First table:
+---------+------+
| Roll no | Name |
+---------+------+
| 1 | ABC |
| 2 | DEF |
+---------+------+
Second table:
+---------+--------+--------+--------+
| Roll no | Marks1 | Marks2 | Marks3 |
+---------+--------+--------+--------+
| 1 | 70 | 80 | 75 |
| 2 | 60 | 65 | 70 |
+---------+--------+--------+--------+
You have a database on MySQL namely school having three tables in it — Student, Subject, Teacher. Write a program to store these tables in three DataFrames.
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost",
user = "root",
passwd = "mypass",
database = "school")
mdf = pd.read_sql("SELECT * FROM student;", mycon)
ndf = pd.read_sql("SELECT * FROM subject;", mycon)
ldf = pd.read_sql("SELECT * FROM teacher;", mycon)
print("Student dataframe:\n")
print(mdf)
print("Subject dataframe:\n")
print(ndf)
print("Teacher dataframe:\n")
print(ldf)
Student dataframe:
NO NAME AGE DEPARTMENT FEE SEX
0 1 PANKAJ 24 COMPUTER 120 M
1 2 SHALINI 21 HISTORY 200 F
2 3 SANJAY 22 HINDI 300 M
3 4 SUDHA 25 HISTORY 400 F
4 5 ANANYA 23 COMPUTER 450 F
Subject dataframe:
Subject_id name Teacher_id
0 101 math 201
1 102 science 202
2 103 english 203
3 104 hindi 204
Teacher dataframe:
No Name Age Department Salary Sex Dateofjoin
0 1 Jugal 34 Computer 12000 M 1997-01-10
1 2 Sharmila 31 History 20000 F 1998-03-24
2 3 Sandeep 32 Maths 30000 M 1996-12-12
3 4 Sangeeta 35 History 40000 F 1999-07-01
4 5 Rakesh 42 Maths 25000 M 1997-09-05
5 6 Shyam 50 History 30000 M 1998-06-27
6 7 Shiv Om 44 Computer 21000 M 1997-02-25
7 8 Shalakha 33 Maths 20000 F 1997-07-31
The DataFrame SDF stores the sales records of 100 salesmen. Write a program to store this as a table in database namely "company" on MySQL.
import pandas as pd
from sqlalchemy import create_engine
import pymysql
def sales(sdf):
engine = create_engine('mysql+pymysql://root:Mypass@localhost/company')
mycon = engine.connect()
sdf.to_sql('Records', mycon, if_exists = 'replace', index = False)
Consider the SDF DataFrame storing the sales records of 100 salesmen. Write a program that stores only the first 25 rows of the DataFrame in a table on MySQL database.
import pandas as pd
from sqlalchemy import create_engine
import pymysql
def sales(sdf):
engine = create_engine('mysql+pymysql://root:Mypass@localhost/company')
mycon = engine.connect()
first_25_rows = sdf.head(25)
first_25_rows.to_sql('Records', mycon, if_exists = 'replace', index = False)
The sales table of company database of MySQL stores the sales records of 100 salesmen. Write a program to load only those records in DataFrame which have made sales more than of 50000/-.
import pandas as pd
import mysql.connector as sqltor
mycon = sqltor.connect(host = "localhost", user = "root", passwd = "mypass", database = "company")
df = pd.read_sql("SELECT * FROM sales WHERE Sales > 50000;", mycon)
print(df)