Merging pandas DataFrames generated with a loop on SQL Database Data

This works BUT the outputs are not matching on the index (Date). Instead the new columns are added but start at the first dataframes last row i.e. the data is stacked “on top” of each other so the Date index is repeated. Is there a way to iterate and create columns that are matched by Date?

indexReturnData = pd.DataFrame()
indexcount = int(input("how many indices do you want to use?  Enter a quantity: "))
i=0
while i < indexcount:
    indexList = pd.read_sql_query('SELECT * FROM Instruments', conn)
    indexList = indexList[['InstrumentName','InstrumentID']]
    indexList
    indexListFind = input('Enter partial index name: ')
    indexList = indexList[indexList['InstrumentName'].str.contains(indexListFind, case=False)]
    # #need to add and if else statement in case of errors....
    indexList = pd.DataFrame(indexList)
    print(indexList)
    indexID = input('Pick/Type in an INDEX list ID ("InstrumentID") From the List Above: ')
    
    indexName = indexList.query('InstrumentID ==' + str(indexID))['InstrumentName']
    indexName = list(indexName)  
    indexReturns = pd.read_sql_query("""
        SELECT *
        FROM InstrumentPrices 
        WHERE InstrumentID=""" + indexID
        , conn)

    indexReturns = indexReturns.filter(['ReportingDate', 'Returns'])  
    indexReturns = indexReturns.rename(columns={'ReportingDate': 'Date','Returns': indexName[0]})
    indexReturns = indexReturns.set_index('Date')
    
    indexReturnData = indexReturnData.append(indexReturns)
    i += 1

​x
 
indexReturnData = pd.DataFrame()indexcount = int(input("how many indices do you want to use?  Enter a quantity: "))i=0while i < indexcount:    indexList = pd.read_sql_query('SELECT * FROM Instruments', conn)    indexList = indexList[['InstrumentName','InstrumentID']]    indexList    indexListFind = input('Enter partial index name: ')    indexList = indexList[indexList['InstrumentName'].str.contains(indexListFind, case=False)]    # #need to add and if else statement in case of errors....    indexList = pd.DataFrame(indexList)    print(indexList)    indexID = input('Pick/Type in an INDEX list ID ("InstrumentID") From the List Above: ')        indexName = indexList.query('InstrumentID ==' + str(indexID))['InstrumentName']    indexName = list(indexName)      indexReturns = pd.read_sql_query("""        SELECT *        FROM InstrumentPrices         WHERE InstrumentID=""" + indexID        , conn)​    indexReturns = indexReturns.filter(['ReportingDate', 'Returns'])      indexReturns = indexReturns.rename(columns={'ReportingDate': 'Date','Returns': indexName[0]})    indexReturns = indexReturns.set_index('Date')        indexReturnData = indexReturnData.append(indexReturns)    i += 1​

Output:

     Date       S&P500  S&P600
308 9/1/1995    0.042   
309 10/1/1995  -0.004   
310 11/1/1995   0.044   
311 12/1/1995   0.019   
…..    …..       …..      …..
603 4/1/2020    0.128   
604 5/1/2020    0.048   
605 6/1/2020    0.020   
606 7/1/2020    0.056   
623 9/1/1995             0.025
624 10/1/1995           -0.050
625 11/1/1995            0.038
626 12/1/1995            0.016
…..    …..       …..      …..
918 4/1/2020             0.126
919 5/1/2020             0.041
920 6/1/2020             0.036
921 7/1/2020             0.040

 
     Date       S&P500  S&P600308 9/1/1995    0.042   309 10/1/1995  -0.004   310 11/1/1995   0.044   311 12/1/1995   0.019   …..    …..       …..      …..603 4/1/2020    0.128   604 5/1/2020    0.048   605 6/1/2020    0.020   606 7/1/2020    0.056   623 9/1/1995             0.025624 10/1/1995           -0.050625 11/1/1995            0.038626 12/1/1995            0.016…..    …..       …..      …..918 4/1/2020             0.126919 5/1/2020             0.041920 6/1/2020             0.036921 7/1/2020             0.040​

Thanks!

Answer

Just based on what your current output is and what I think your desired output is, I think you can get away with just a df.groupby('Date').sum(). Running that will group any duplicates in the ‘Date’ column and do a sum on all the values it finds for each column. If I’m understanding right, each column will only have a single value for the date-row, so it’ll ‘sum’ that single number: that is, it’ll return that number.

I copied the little output section you have above (and removed the blank rows) and just did df.groupby('Date').sum() and got this:

           S&P500  S&P600
Date
10/1/1995  -0.004  -0.050
11/1/1995   0.044   0.038
12/1/1995   0.019   0.016
4/1/2020    0.128   0.126
5/1/2020    0.048   0.041
6/1/2020    0.020   0.036
7/1/2020    0.056   0.040
9/1/1995    0.042   0.025

 
           S&P500  S&P600Date10/1/1995  -0.004  -0.05011/1/1995   0.044   0.03812/1/1995   0.019   0.0164/1/2020    0.128   0.1265/1/2020    0.048   0.0416/1/2020    0.020   0.0367/1/2020    0.056   0.0409/1/1995    0.042   0.025​

Advertisement

Answer