Given a table like this:
How could I use SQL (T-SQL/SQL Server Mgt Studio), R, or Python to do the following in this order:
For each unique ID, update col A with the maximum value of A from all rows and then delete all rows except the one with the highest value in col B. I need to end up with only one row for a given ID.
Written another way; I have duplicate data and need to delete duplicate rows but only after ensuring I have the maximum value of col A from any duplicates applied to the row with the maximum value of col B. I want to only keep one row for a given ID but the row I want to keep is the one with the max col B. If there are multiple identical ID & col B rows, delete all but one.
library(dplyr) your_data %>% group_by(ID) %>% mutate(A = max(A)) %>% slice(which.max(B)) %>% ungroup()
library(data.table) setDT(your_data) your_data[, A := max(A), by = ID][, .SD[which.max(B)], by = ID]