I am currently working on a project to build a database on professor’s research paper database. This is my first time building a database(never had experience with MYSQL) and I am learning as I am doing it.
I was able to use an api to get the data, for example:
{“authorId”: “1773022”, “url”: “https://www.semanticscholar.org/author/1773022”, “papers”: [{“paperId”: “1253d2704580a74e776ae211602cfde71532c057”, “title”: “Nonlinear Schrodinger Kernel for hardware acceleration of machine learning”}, {“paperId”: “71f49f1e3ccb2e92d606db9b3db66c669a163bb6”, “title”: “Task-Driven Learning of Spatial Combinations of Visual Features”}, {“paperId”: “bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662”, “title”: “Statistical Learning of Visual Feature Hierarchies”}]}
How would I use python to turn this into a table so I can use it to build my database?
I am trying to make a table where columns are: Paper ID|Title|
Advertisement
Answer
From https://www.w3schools.com/python/python_mysql_getstarted.asp
Install:
python -m pip install mysql-connector-python
Over view
Create a connection:
import mysql.connector mydb = mysql.connector.connect( host="localhost", user="yourusername", password="yourpassword" )
Create a cursor to interact with the connection you made, then create a Database:
mycursor = mydb.cursor() mycursor.execute("CREATE DATABASE mydatabase") mydb.close()
After the database has been created, you can start connecting with it like so:
mydb = mysql.connector.connect( host="localhost", user="yourusername", password="yourpassword", database="mydatabase" )
Note that you don’t have to actually close your connection and reopen it to connect to that database, yet I don’t see the docs mentioning anything about interacting with that specific database after you’ve created it, so I’m going to close it after I create the database…
Create the table with the proper datatypes and constraints:
mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")
Then insert into it and commit the entries:
sql = "INSERT INTO customers (name, address) VALUES (%s, %s)" val = ("John", "Highway 21") mycursor.execute(sql, val) mydb.commit()
Close the connection:
mydb.close()
Your Specific Case
This is the data you have provided:
data = { "authorId": "1773022", "url": "https://www.semanticscholar.org/author/1773022", "papers": [ { "paperId": "1253d2704580a74e776ae211602cfde71532c057", "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning" }, { "paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6", "title": "Task-Driven Learning of Spatial Combinations of Visual Features" }, { "paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662", "title": "Statistical Learning of Visual Feature Hierarchies" } ] }
Granted that I don’t know all the details, but based on the data given, I’d assume that you would want a table for:
- Authors – id (auto increment pk), authorId (varchar), url (varchar),
- Papers – id (auto increment pk), authorId (varchar fk), paperId, title (varchar)
Modify this as you please:
import mysql.connector mydb = mysql.connector.connect( host="localhost", user="yourusername", password="yourpassword", database="mydatabase" ) mycursor = mydb.cursor() # create the tables if they don't exist authors = False papers = False mycursor.execute("SHOW TABLES") for e in mycursor.fetchall(): if "authors" == e[0]: authors = True if "papers" == e[0]: papers = True if not authors: mycursor.execute("CREATE TABLE authors (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), url VARCHAR(255))") if not papers: mycursor.execute("CREATE TABLE papers (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), paperId VARCHAR(255), title VARCHAR(255))") # insert into them - recall that "data" is what gets returned by your api. I am assuming 1 entry. a = data["authorId"] u = data["url"] ps = data["papers"] # I am going to check if the authorId already exists. mycursor.execute("SELECT * FROM authors WHERE authorId = '" + a + "'") as = mycursor.fetchall() if len(as) == 0: sql = "INSERT INTO authors (authorId, url) VALUES (%s, %s)" val = (a, u) mycursor.execute(sql, val) mydb.commit() # for each paper in papers for p in ps: # Consider this: Suppose you had multiple authors for 1 paper. Provided that they are entered into the authors table (which they should be), you can add an entry to the papers table for each author. # check to see if the paper is already in the database with the specific author, paperId, and title mycursor.execute("SELECT * FROM papers WHERE authorId = '" + a + "' AND paperId = '" + p["paperId"] + "' AND title = '" + p["title"] + "'") pc = mycursor.fetchall() # if they are no entries, insert it if len(pc) == 0: sql = "INSERT INTO papers (authorId, paperId, title) VALUES (%s, %s, %s)" val = (a, p["paperId"], p["title"]) mycursor.execute(sql, val) mydb.commit() mydb.close()
Side note:
I have not tested this. I will be amazed myself if this works first try. If you come across any errors, I can do my best to trouble shoot.
EDIT:
Once again, I have not tested this. It may not work, but the idea is there. Please check out this article: What are some good Python ORM solutions?
Save this as Models.py
import mysql.connector import os # -------------------------------------------------- # General Model # -------------------------------------------------- # table columns is a dictionary that looks like: # table_columns = {"column name": "constaints"} # ex: table_columns = {"authorId": "VARCHAR(255)", "next col": "VARCHAR(255)"}, etc. # data is the same as table_columns, but contains the data to be added for 1 row class Model: def __init__(self, table_columns={}, table_name=None, data=None, host=os.environ["mysqlhost"], user=os.environ["mysqluser"], password=os.environ["mysqlpassword"], database=os.environ["database"]): self.db = None self.cursor = None self.table_name = table_name self.table_columns = table_columns self.data = data if host is not None and user is not None and password is not None and database is not None: try: self.db = mysql.connector.connect( host=host, user=user, password=password, database=database ) except Exception as e: print(e) #print("One or more credentials were incorrect! Could not connect to you database!") if self.db is not None: self.cursor = self.db.cursor() if table_name is not None: self.cursor.execute("SHOW TABLES;") for e in self.cursor.fetchall(): if e[0] != self.table_name: pk = [[e, self.table_columns[e]] for e in self.table_columns if "primary key" in e.lower()] if len(pk) == 1: pk = " ".join(pk) + ", " del self.table_columns[pk[0]] else: pk = "" try: table_string = 'CREATE TABLE "' + self.table_name + '"(' + pk + ", ".join([" ".join(['"' + "_".join(c.split()) + '"', self.table_columns[c].upper()]) for c in self.table_columns]) + ');' self.cursor.execute(table_string) print("Created table with name: " + self.table_name) except Exception as e: self.db.rollback() print(e) def insert(self): if self.data is not None: pkname = "" try: self.cursor.execute('SHOW KEYS FROM "(%s)" WHERE Key_name = (%s);', (self.table_name, 'PRIMARY')) pkname = self.cursor.fetchall()[0] if pkname in self.table_columns and pkname not in self.data: del self.table_columns[pkname] elif pkname not in self.table_columns and pkname in self.data: del self.table_columns[pkname] except Exception as e: print("Could not get primary key name!") print(e) try: self.cursor.execute('SHOW COLUMNS FROM "' + self.table_name + '";') self.table_columns = {e: "" for e in self.cursor.fetchall()} except Exception as e: self.db.rollback() print("Could not find table with name " + self.table_name) print(e) flag = True for e in self.data: if e not in self.table_columns: flag = False if flag: if len(self.data) == len(self.table_columns): col = ["'" + e + "'" if e[0] != "'" and e[-1] !="'" else e for e in self.data] data = [self.data[e] for e in self.data] sql = "INSERT INTO %s (%s) VALUES (%s)" val = ('"' + self.table_name + '"', ", ".join(col), ", ".join(data)) try: self.cursor.execute(sql, val) self.save() except Exception as e: print("Could not insert into " + self.table_name) print(e) else: print("Found unexpected data. Try an insert or update query.") def save(self): committed = False try: self.db.commit() committed = True except Exception as e: self.db.rollback() print(e) if committed: self.db.close() self.db = None self.cursor = None
Usage:
import os import Model as m os.environ["mysqlhost"] = "host" os.environ["mysqluser"] = "mysqluser" os.environ["mysqlpassword"] = "password" os.environ["database"] = "database" # Presumably, this is exactly what your API returns. data = { "authorId": "118985833", "url": "semanticscholar.org/author/118985833", "papers": [ { "paperId": "0b698cd2269d9b52c75dbdf1994dbc9a98fb16c8", "title": "The development of self-care workshop to prevent vicarious traumatization in training therapists working with sexually exploited children" } ] } data_for_author = {"authorId": data["authorId"], "url": data["url"]} # table_name=None # table_columns={} # data=None model = m.Model(table_name="authors", data=data_for_author) model.insert()
This may not even work – but I did it to give you some hope. There are easier ways to insert data into your database. You need to look into ORMs – the django (a webserver library for python) library has one native to it, that makes it super easy to manage your database.