Skip to content
Advertisement

R – get a vector that tells me if a value of another vector is the first appearence or not

I have a data frame of sales with three columns: the code of the customer, the month the customer bought that item, and the year.

A customer can buy something in september and then in december make another purchase, so appear two times. But I’m interested in knowing the absolutely new customoers by month and year.

So I have thought in make an iteration and some checks and use the %in% function and build a boolean vector that tells me if a customer is new or not and then count by month and year with SQL using this new vector.

But I’m wondering if there’s a specific function or a better way to do that.

This is an example of the data I would like to have:

So put it more simple: the data frame is sorted by date, and I’m interested in a vector (new_customer) that tells me if the customer purchased something for the first time or not. For example customer 25 bought something the first day, and then four days later bought something again, so is not a new customer. The same can be seen with customer 17 and 22.

Advertisement

Answer

I create dummy data my self with id, month of numeric format, and year

Then, group by id and arrange by year and month (order is meaningful). Then use filter and row_number().

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement