Compare two data frames and find number of nulls

Question

I have a problem. I heve data frame 1 named &#8220;df&#8221;: enter image description here And I have the data frame 2 named &#8220;dfP1&#8221;: enter image description here I want to compare the unique rows that exist in colum &#8220;Campo a Validar&#8221; from &#8220;dfP1&#8221; vs the columns in &#8220;df&…

Accepted Answer

It was a bit challenging to determine what you needed, but this might come close to your ideal solution.import pandas as pdfrom numpy import NaN# Assuming that these dictionaries accurately reflect# your DataFrames's contents, then the # following might work:_df = {    "c1":  [1.0, 3.0, 5.0, 7.0],    "c2":  [1.0, 3.0, 5.0, 7.0],    "c3":  [1.0, 3.0, 5.0, 7.0],    "c4":  [1.0, 3.0, 5.0, 7.0],    "Nº Línea Cliente": [        "Hay algo",        "Hay algo",        "Hay algo",        NaN],    "c6":  [1.0, 3.0, 5.0, 7.0],    "c7":  [1.0, 3.0, 5.0, 7.0],    "c8":  [1.0, 3.0, 5.0, 7.0],    "c9":  [1.0, 3.0, 5.0, 7.0],    "c10": [1.0, 3.0, 5.0, 7.0],}Campo_a_Validar = [        "Nº Línea Cliente"        for campo in range(4)]Campo_a_Validar.append("TIPO DE GARANTIA 1")_dfP1 = {    "ID_Val": [1,2,3,4,5],    "Tipo_Validación": [1, 2, 3, 4, 1],    "Campo_a_Validar": Campo_a_Validar,}# Initializing the DataFramesdf = pd.DataFrame(_df)dfP1 = pd.DataFrame(_dfP1)def analizar_para_nulos(_df_, _dfP1_):    try:        contar_nulos  = lambda DF, ColName: DF.groupby([ColName])[ColName].nunique()        nulos_de_df   = contar_nulos(_df_, "Nº Línea Cliente")        nulos_de_dfP1 = contar_nulos(_dfP1_, "Campo_a_Validar")         assert(            nulos_de_df.values[0] == nulos_de_dfP1.values[0]        )        num_nulos = nulos_de_df        return num_nulos.values[0]    except AssertionError:        return 0# Check whether the number of unique rows is# equal to the number of unique rows in# the other tableis_coincidence = analizar_para_nulos(df, dfP1)if is_coincidence:    base = [is_coincidence]    base.extend([""        for position in range(len(df.c1) - 1)])    num_columns = len(df.T)    df.insert(        loc=num_columns,        column="Numeros_de_Nulos",        value=base    )    print(df)else:    print(df)Output:    c1   c2   c3   c4 Nº Línea Cliente   c6   c7   c8   c9  c10 Numeros_de_Nulos0  1.0  1.0  1.0  1.0         Hay algo  1.0  1.0  1.0  1.0  1.0                11  3.0  3.0  3.0  3.0         Hay algo  3.0  3.0  3.0  3.0  3.0                 2  5.0  5.0  5.0  5.0         Hay algo  5.0  5.0  5.0  5.0  5.0                 3  7.0  7.0  7.0  7.0              NaN  7.0  7.0  7.0  7.0  7.0

Advertisement

Answer