This works BUT the outputs are not matching on the index (Date). Instead the new columns are added but start at the first dataframes last row i.e. the data is stacked “on top” of each other so the Date index is repeated. Is there a way to iterate and create columns that are matched by Date?
indexReturnData = pd.DataFrame() indexcount = int(input("how many indices do you want to use? Enter a quantity: ")) i=0 while i < indexcount: indexList = pd.read_sql_query('SELECT * FROM Instruments', conn) indexList = indexList[['InstrumentName','InstrumentID']] indexList indexListFind = input('Enter partial index name: ') indexList = indexList[indexList['InstrumentName'].str.contains(indexListFind, case=False)] # #need to add and if else statement in case of errors.... indexList = pd.DataFrame(indexList) print(indexList) indexID = input('Pick/Type in an INDEX list ID ("InstrumentID") From the List Above: ') indexName = indexList.query('InstrumentID ==' + str(indexID))['InstrumentName'] indexName = list(indexName) indexReturns = pd.read_sql_query(""" SELECT * FROM InstrumentPrices WHERE InstrumentID=""" + indexID , conn) indexReturns = indexReturns.filter(['ReportingDate', 'Returns']) indexReturns = indexReturns.rename(columns={'ReportingDate': 'Date','Returns': indexName[0]}) indexReturns = indexReturns.set_index('Date') indexReturnData = indexReturnData.append(indexReturns) i += 1
Output:
Date S&P500 S&P600 308 9/1/1995 0.042 309 10/1/1995 -0.004 310 11/1/1995 0.044 311 12/1/1995 0.019 ….. ….. ….. ….. 603 4/1/2020 0.128 604 5/1/2020 0.048 605 6/1/2020 0.020 606 7/1/2020 0.056 623 9/1/1995 0.025 624 10/1/1995 -0.050 625 11/1/1995 0.038 626 12/1/1995 0.016 ….. ….. ….. ….. 918 4/1/2020 0.126 919 5/1/2020 0.041 920 6/1/2020 0.036 921 7/1/2020 0.040
Thanks!
Advertisement
Answer
Just based on what your current output is and what I think your desired output is, I think you can get away with just a df.groupby('Date').sum()
. Running that will group any duplicates in the ‘Date’ column and do a sum on all the values it finds for each column. If I’m understanding right, each column will only have a single value for the date-row, so it’ll ‘sum’ that single number: that is, it’ll return that number.
I copied the little output section you have above (and removed the blank rows) and just did df.groupby('Date').sum()
and got this:
S&P500 S&P600 Date 10/1/1995 -0.004 -0.050 11/1/1995 0.044 0.038 12/1/1995 0.019 0.016 4/1/2020 0.128 0.126 5/1/2020 0.048 0.041 6/1/2020 0.020 0.036 7/1/2020 0.056 0.040 9/1/1995 0.042 0.025