I am currently working on a project to build a database on professor’s research paper database. This is my first time building a database(never had experience with MYSQL) and I am learning as I am doing it.
I was able to use an api to get the data, for example:
{“authorId”: “1773022”, “url”: “https://www.semanticscholar.org/author/1773022”, “papers”: [{“paperId”: “1253d2704580a74e776ae211602cfde71532c057”, “title”: “Nonlinear Schrodinger Kernel for hardware acceleration of machine learning”}, {“paperId”: “71f49f1e3ccb2e92d606db9b3db66c669a163bb6”, “title”: “Task-Driven Learning of Spatial Combinations of Visual Features”}, {“paperId”: “bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662”, “title”: “Statistical Learning of Visual Feature Hierarchies”}]}
How would I use python to turn this into a table so I can use it to build my database?
I am trying to make a table where columns are: Paper ID|Title|
Advertisement
Answer
From https://www.w3schools.com/python/python_mysql_getstarted.asp
Install:
python -m pip install mysql-connector-python
Over view
Create a connection:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword"
)
Create a cursor to interact with the connection you made, then create a Database:
mycursor = mydb.cursor()
mycursor.execute("CREATE DATABASE mydatabase")
mydb.close()
After the database has been created, you can start connecting with it like so:
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
Note that you don’t have to actually close your connection and reopen it to connect to that database, yet I don’t see the docs mentioning anything about interacting with that specific database after you’ve created it, so I’m going to close it after I create the database…
Create the table with the proper datatypes and constraints:
mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")
Then insert into it and commit the entries:
sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)
mydb.commit()
Close the connection:
mydb.close()
Your Specific Case
This is the data you have provided:
data = {
"authorId": "1773022",
"url": "https://www.semanticscholar.org/author/1773022",
"papers": [
{
"paperId": "1253d2704580a74e776ae211602cfde71532c057",
"title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"
},
{
"paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6",
"title": "Task-Driven Learning of Spatial Combinations of Visual Features"
},
{
"paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662",
"title": "Statistical Learning of Visual Feature Hierarchies"
}
]
}
Granted that I don’t know all the details, but based on the data given, I’d assume that you would want a table for:
- Authors – id (auto increment pk), authorId (varchar), url (varchar),
- Papers – id (auto increment pk), authorId (varchar fk), paperId, title (varchar)
Modify this as you please:
import mysql.connector
mydb = mysql.connector.connect(
host="localhost",
user="yourusername",
password="yourpassword",
database="mydatabase"
)
mycursor = mydb.cursor()
# create the tables if they don't exist
authors = False
papers = False
mycursor.execute("SHOW TABLES")
for e in mycursor.fetchall():
if "authors" == e[0]:
authors = True
if "papers" == e[0]:
papers = True
if not authors:
mycursor.execute("CREATE TABLE authors (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), url VARCHAR(255))")
if not papers:
mycursor.execute("CREATE TABLE papers (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), paperId VARCHAR(255), title VARCHAR(255))")
# insert into them - recall that "data" is what gets returned by your api. I am assuming 1 entry.
a = data["authorId"]
u = data["url"]
ps = data["papers"]
# I am going to check if the authorId already exists.
mycursor.execute("SELECT * FROM authors WHERE authorId = '" + a + "'")
as = mycursor.fetchall()
if len(as) == 0:
sql = "INSERT INTO authors (authorId, url) VALUES (%s, %s)"
val = (a, u)
mycursor.execute(sql, val)
mydb.commit()
# for each paper in papers
for p in ps:
# Consider this: Suppose you had multiple authors for 1 paper. Provided that they are entered into the authors table (which they should be), you can add an entry to the papers table for each author.
# check to see if the paper is already in the database with the specific author, paperId, and title
mycursor.execute("SELECT * FROM papers WHERE authorId = '" + a + "' AND paperId = '" + p["paperId"] + "' AND title = '" + p["title"] + "'")
pc = mycursor.fetchall()
# if they are no entries, insert it
if len(pc) == 0:
sql = "INSERT INTO papers (authorId, paperId, title) VALUES (%s, %s, %s)"
val = (a, p["paperId"], p["title"])
mycursor.execute(sql, val)
mydb.commit()
mydb.close()
Side note:
I have not tested this. I will be amazed myself if this works first try. If you come across any errors, I can do my best to trouble shoot.
EDIT:
Once again, I have not tested this. It may not work, but the idea is there. Please check out this article: What are some good Python ORM solutions?
Save this as Models.py
import mysql.connector
import os
# --------------------------------------------------
# General Model
# --------------------------------------------------
# table columns is a dictionary that looks like:
# table_columns = {"column name": "constaints"}
# ex: table_columns = {"authorId": "VARCHAR(255)", "next col": "VARCHAR(255)"}, etc.
# data is the same as table_columns, but contains the data to be added for 1 row
class Model:
def __init__(self, table_columns={}, table_name=None, data=None, host=os.environ["mysqlhost"], user=os.environ["mysqluser"], password=os.environ["mysqlpassword"], database=os.environ["database"]):
self.db = None
self.cursor = None
self.table_name = table_name
self.table_columns = table_columns
self.data = data
if host is not None and user is not None and password is not None and database is not None:
try:
self.db = mysql.connector.connect(
host=host,
user=user,
password=password,
database=database
)
except Exception as e:
print(e)
#print("One or more credentials were incorrect! Could not connect to you database!")
if self.db is not None:
self.cursor = self.db.cursor()
if table_name is not None:
self.cursor.execute("SHOW TABLES;")
for e in self.cursor.fetchall():
if e[0] != self.table_name:
pk = [[e, self.table_columns[e]] for e in self.table_columns if "primary key" in e.lower()]
if len(pk) == 1:
pk = " ".join(pk) + ", "
del self.table_columns[pk[0]]
else:
pk = ""
try:
table_string = 'CREATE TABLE "' + self.table_name + '"(' + pk + ", ".join([" ".join(['"' + "_".join(c.split()) + '"', self.table_columns[c].upper()]) for c in self.table_columns]) + ');'
self.cursor.execute(table_string)
print("Created table with name: " + self.table_name)
except Exception as e:
self.db.rollback()
print(e)
def insert(self):
if self.data is not None:
pkname = ""
try:
self.cursor.execute('SHOW KEYS FROM "(%s)" WHERE Key_name = (%s);', (self.table_name, 'PRIMARY'))
pkname = self.cursor.fetchall()[0]
if pkname in self.table_columns and pkname not in self.data:
del self.table_columns[pkname]
elif pkname not in self.table_columns and pkname in self.data:
del self.table_columns[pkname]
except Exception as e:
print("Could not get primary key name!")
print(e)
try:
self.cursor.execute('SHOW COLUMNS FROM "' + self.table_name + '";')
self.table_columns = {e: "" for e in self.cursor.fetchall()}
except Exception as e:
self.db.rollback()
print("Could not find table with name " + self.table_name)
print(e)
flag = True
for e in self.data:
if e not in self.table_columns:
flag = False
if flag:
if len(self.data) == len(self.table_columns):
col = ["'" + e + "'" if e[0] != "'" and e[-1] !="'" else e for e in self.data]
data = [self.data[e] for e in self.data]
sql = "INSERT INTO %s (%s) VALUES (%s)"
val = ('"' + self.table_name + '"', ", ".join(col), ", ".join(data))
try:
self.cursor.execute(sql, val)
self.save()
except Exception as e:
print("Could not insert into " + self.table_name)
print(e)
else:
print("Found unexpected data. Try an insert or update query.")
def save(self):
committed = False
try:
self.db.commit()
committed = True
except Exception as e:
self.db.rollback()
print(e)
if committed:
self.db.close()
self.db = None
self.cursor = None
Usage:
import os
import Model as m
os.environ["mysqlhost"] = "host"
os.environ["mysqluser"] = "mysqluser"
os.environ["mysqlpassword"] = "password"
os.environ["database"] = "database"
# Presumably, this is exactly what your API returns.
data = {
"authorId": "118985833",
"url": "semanticscholar.org/author/118985833",
"papers": [
{
"paperId": "0b698cd2269d9b52c75dbdf1994dbc9a98fb16c8",
"title": "The development of self-care workshop to prevent vicarious traumatization in training therapists working with sexually exploited children"
}
]
}
data_for_author = {"authorId": data["authorId"], "url": data["url"]}
# table_name=None
# table_columns={}
# data=None
model = m.Model(table_name="authors", data=data_for_author)
model.insert()
This may not even work – but I did it to give you some hope. There are easier ways to insert data into your database. You need to look into ORMs – the django (a webserver library for python) library has one native to it, that makes it super easy to manage your database.