Skip to content
Advertisement

Evaluating a variable using the IN() Function

I’m trying to resolve a datastep variable in the in() function. I have a dataset that looks like the following:

The samples tested vary by run. Normally the only sample levels in the dataset are the ones in the range provided by “Samples Tested”. However occasionally this is not the case, and it can get messy. For example the last one I worked on looked like this:

In this case I’d want to drop all rows with sample levels that were not included in Samples Tested, which I did by manually adding the code:

But what I want to do is have this done automatically by looking at the samples tested column. It’s easy enough to turn a “-” into a “:”, but where I’m stuck is getting the IN() function to recognize or resolve a variable. I would like code that looks like this: if sample_level not in(Samples_Tested) then delete; where samples_tested has been transformed to be something that the IN() function can handle. I’m also not opposed to using proc sql; if anyone has a solution that they think will work. I know you can do things like

But the problem is that the samples tested varies by run and there could be 16 different runs. Hopefully I’ve explained the challenge clearly enough. Thanks for taking the time to read this and thanks in advance for your help!

Advertisement

Answer

Assuming the values of SAMPLES_TESTED is constant for each value of RUN you could use it to generate the selection criteria. For example you could use a data _null_ step to write a WHERE statement to a file and then %include that code into another data step.

Note: IN is an operator and not a function.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement