Skip to content
Advertisement

How to select the best item in each group?

I have table reports:

+----+-----------+
| id | file_name |
+----+-----------+
| 1  | jan.xml   |
+----+-----------+
| 2  | jan.csv   |
+----+-----------+
| 3  | feb.csv   |
+----+-----------+

In human language: there are reports for each month. Each report could be in XML or CSV format. There could be 1-2 reports for each month in unique format.

I want to select the reports for all months, picking only 1 file for each month. The XML format is more preferable.

So, expected output is:

+----+-----------+
| id | file_name |
+----+-----------+
| 1  | jan.xml   |
+----+-----------+
| 3  | feb.csv   |
+----+-----------+

Explanation: the file jan.csv was excluded since there is more preferable report for that month: jan.xml.

Advertisement

Answer

As mentioned in the comments your data structure has a number of challenges. It really needs a column for ReportDate or something along those lines that is a date/datetime so you know which month the report belongs to. That would also give you something to sort by when you get your data back. Aside from those much needed improvements you can get the desired results from your sample data with something like this.

create table SomeFileTable
(
    id int
    , file_name varchar(10)
)

insert SomeFileTable
select 1, 'jan.xml' union all
select 2, 'jan.csv' union all
select 3, 'feb.csv'

select s.id
    , s.file_name
from
(
    select *
        , FileName = parsename(file_name, 2)
        , FileExtension = parsename(file_name, 1)
        , RowNum = ROW_NUMBER() over(partition by parsename(file_name, 2) order by case parsename(file_name, 1) when 'xml' then 1 else 2 end)
    from SomeFileTable
) s
where s.RowNum = 1
--ideally you would want to order the results but you don't have much of anything to work with in your data as a reliable sorting order since the dates are implied by the file name
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement