Skip to content
Advertisement

How do I get the data extracted from API to my database

I am currently working on a project to build a database on professor’s research paper database. This is my first time building a database(never had experience with MYSQL) and I am learning as I am doing it.

I was able to use an api to get the data, for example:

{“authorId”: “1773022”, “url”: “https://www.semanticscholar.org/author/1773022”, “papers”: [{“paperId”: “1253d2704580a74e776ae211602cfde71532c057”, “title”: “Nonlinear Schrodinger Kernel for hardware acceleration of machine learning”}, {“paperId”: “71f49f1e3ccb2e92d606db9b3db66c669a163bb6”, “title”: “Task-Driven Learning of Spatial Combinations of Visual Features”}, {“paperId”: “bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662”, “title”: “Statistical Learning of Visual Feature Hierarchies”}]}

How would I use python to turn this into a table so I can use it to build my database?

I am trying to make a table where columns are: Paper ID|Title|

Advertisement

Answer

From https://www.w3schools.com/python/python_mysql_getstarted.asp

Install:

python -m pip install mysql-connector-python

Over view

Create a connection:

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword"
)

Create a cursor to interact with the connection you made, then create a Database:

mycursor = mydb.cursor()

mycursor.execute("CREATE DATABASE mydatabase")

mydb.close()

After the database has been created, you can start connecting with it like so:

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

Note that you don’t have to actually close your connection and reopen it to connect to that database, yet I don’t see the docs mentioning anything about interacting with that specific database after you’ve created it, so I’m going to close it after I create the database…

Create the table with the proper datatypes and constraints:

mycursor.execute("CREATE TABLE customers (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255), address VARCHAR(255))")

Then insert into it and commit the entries:

sql = "INSERT INTO customers (name, address) VALUES (%s, %s)"
val = ("John", "Highway 21")
mycursor.execute(sql, val)

mydb.commit()

Close the connection:

mydb.close()

Your Specific Case

This is the data you have provided:

data = {
    "authorId": "1773022",
    "url": "https://www.semanticscholar.org/author/1773022",
    "papers": [
        {
            "paperId": "1253d2704580a74e776ae211602cfde71532c057",
            "title": "Nonlinear Schrodinger Kernel for hardware acceleration of machine learning"
        }, 
        {
            "paperId": "71f49f1e3ccb2e92d606db9b3db66c669a163bb6",
            "title": "Task-Driven Learning of Spatial Combinations of Visual Features"
        },
        {
            "paperId": "bb35ae8a50de54c9ca29fbdf1ea2fbbb4e8c4662",
            "title": "Statistical Learning of Visual Feature Hierarchies"
        }
    ]
}

Granted that I don’t know all the details, but based on the data given, I’d assume that you would want a table for:

  1. Authors – id (auto increment pk), authorId (varchar), url (varchar),
  2. Papers – id (auto increment pk), authorId (varchar fk), paperId, title (varchar)

Modify this as you please:

import mysql.connector

mydb = mysql.connector.connect(
  host="localhost",
  user="yourusername",
  password="yourpassword",
  database="mydatabase"
)

mycursor = mydb.cursor()

# create the tables if they don't exist
authors = False
papers = False
mycursor.execute("SHOW TABLES")

for e in mycursor.fetchall():
    if "authors" == e[0]:
        authors = True
    if "papers" == e[0]:
        papers = True

if not authors:
    mycursor.execute("CREATE TABLE authors (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), url VARCHAR(255))")

if not papers:
    mycursor.execute("CREATE TABLE papers (id INT AUTO_INCREMENT PRIMARY KEY, authorId VARCHAR(255), paperId VARCHAR(255), title VARCHAR(255))")

# insert into them - recall that "data" is what gets returned by your api. I am assuming 1 entry.
a = data["authorId"]
u = data["url"]
ps = data["papers"]

# I am going to check if the authorId already exists.
mycursor.execute("SELECT * FROM authors WHERE authorId = '" + a + "'")
as = mycursor.fetchall()
if len(as) == 0:
    sql = "INSERT INTO authors (authorId, url) VALUES (%s, %s)"
    val = (a, u)
    mycursor.execute(sql, val)
    mydb.commit()

# for each paper in papers
for p in ps:
    # Consider this: Suppose you had multiple authors for 1 paper. Provided that they are entered into the authors table (which they should be), you can add an entry to the papers table for each author.
    # check to see if the paper is already in the database with the specific author, paperId, and title
    mycursor.execute("SELECT * FROM papers WHERE authorId = '" + a + "' AND paperId = '" + p["paperId"] + "' AND title = '" + p["title"] + "'")
    pc = mycursor.fetchall()
    # if they are no entries, insert it
    if len(pc) == 0:
        sql = "INSERT INTO papers (authorId, paperId, title) VALUES (%s, %s, %s)"
        val = (a, p["paperId"], p["title"])
        mycursor.execute(sql, val)
        mydb.commit()

mydb.close()

Side note:

I have not tested this. I will be amazed myself if this works first try. If you come across any errors, I can do my best to trouble shoot.


EDIT:

Once again, I have not tested this. It may not work, but the idea is there. Please check out this article: What are some good Python ORM solutions?

Save this as Models.py

import mysql.connector
import os


# --------------------------------------------------
# General Model
# --------------------------------------------------

# table columns is a dictionary that looks like:
# table_columns = {"column name": "constaints"}
# ex: table_columns = {"authorId": "VARCHAR(255)", "next col": "VARCHAR(255)"}, etc.
# data is the same as table_columns, but contains the data to be added for 1 row
class Model:
    def __init__(self, table_columns={}, table_name=None, data=None, host=os.environ["mysqlhost"], user=os.environ["mysqluser"], password=os.environ["mysqlpassword"], database=os.environ["database"]):
        self.db = None
        self.cursor = None
        self.table_name = table_name
        self.table_columns = table_columns
        self.data = data
        if host is not None and user is not None and password is not None and database is not None:
            try:
                self.db = mysql.connector.connect(
                    host=host,
                    user=user,
                    password=password,
                    database=database
                )
            except Exception as e:
                print(e)
                #print("One or more credentials were incorrect! Could not connect to you database!")
        if self.db is not None:
            self.cursor = self.db.cursor()
            if table_name is not None:
                self.cursor.execute("SHOW TABLES;")
                for e in self.cursor.fetchall():
                    if e[0] != self.table_name:
                        pk = [[e, self.table_columns[e]] for e in self.table_columns if "primary key" in e.lower()]
                        if len(pk) == 1:
                            pk = " ".join(pk) + ", "
                            del self.table_columns[pk[0]]
                        else:
                            pk = ""
                        try:
                            table_string = 'CREATE TABLE "' + self.table_name + '"(' + pk + ", ".join([" ".join(['"' + "_".join(c.split()) + '"', self.table_columns[c].upper()]) for c in self.table_columns]) + ');'
                            self.cursor.execute(table_string)
                            print("Created table with name: " + self.table_name)
                        except Exception as e:
                            self.db.rollback()
                            print(e)

    def insert(self):
        if self.data is not None:
            pkname = ""
            try:
                self.cursor.execute('SHOW KEYS FROM "(%s)" WHERE Key_name = (%s);', (self.table_name, 'PRIMARY'))
                pkname = self.cursor.fetchall()[0]
                if pkname in self.table_columns and pkname not in self.data:
                    del self.table_columns[pkname]
                elif pkname not in self.table_columns and pkname in self.data:
                    del  self.table_columns[pkname]
            except Exception as e:
                print("Could not get primary key name!")
                print(e)
            try:
                self.cursor.execute('SHOW COLUMNS FROM "' + self.table_name + '";')
                self.table_columns = {e: "" for e in self.cursor.fetchall()}
            except Exception as e:
                self.db.rollback()
                print("Could not find table with name " + self.table_name)
                print(e)
            flag = True
            for e in self.data:
                if e not in self.table_columns:
                    flag = False
            if flag:
                if len(self.data) == len(self.table_columns):
                    col = ["'" + e + "'" if e[0] != "'" and e[-1] !="'" else e for e in self.data]
                    data = [self.data[e] for e in self.data]
                    sql = "INSERT INTO %s (%s) VALUES (%s)"
                    val = ('"' + self.table_name + '"', ", ".join(col), ", ".join(data))
                    try:
                        self.cursor.execute(sql, val)
                        self.save()
                    except Exception as e:
                        print("Could not insert into " + self.table_name)
                        print(e)
            else:
                print("Found unexpected data. Try an insert or update query.")

    def save(self):
        committed = False
        try:
            self.db.commit()
            committed = True
        except Exception as e:
            self.db.rollback()
            print(e)
        if committed:
            self.db.close()
            self.db = None
            self.cursor = None

Usage:

import os
import Model as m

os.environ["mysqlhost"] = "host"
os.environ["mysqluser"] = "mysqluser"
os.environ["mysqlpassword"] = "password"
os.environ["database"] = "database"

# Presumably, this is exactly what your API returns.
data = {
    "authorId": "118985833",
    "url": "semanticscholar.org/author/118985833",
    "papers": [
        {
            "paperId": "0b698cd2269d9b52c75dbdf1994dbc9a98fb16c8",
            "title": "The development of self-care workshop to prevent vicarious traumatization in training therapists working with sexually exploited children"
        }
    ]
}

data_for_author = {"authorId": data["authorId"], "url": data["url"]}

# table_name=None
# table_columns={}
# data=None
model = m.Model(table_name="authors", data=data_for_author)
model.insert()

This may not even work – but I did it to give you some hope. There are easier ways to insert data into your database. You need to look into ORMs – the django (a webserver library for python) library has one native to it, that makes it super easy to manage your database.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement