Skip to content
Advertisement

SQL / Hibernate insert only if url not exists

I have a list of urls and a table that contains urls. I want to only insert if the url is not in the table.

Data in the Table: 
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... | 

List<String> urls = new ArrayList<>()
urls.add("example.com/");
urls.add("example.com/#");
urls.add("www.example.com/");
urls.add("https://www.example.com/");
urls.add("example.net");

After the insert the Data-table should be contain:

Data in the Table: 
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... | 
| 2 | example.net | ... |

My current attempt is to have a method findByURL(url):List and a call this method for every URL in the list. If the returned list is empty I insert the url into the table but unfortunately my statement makes a difference between example.com and example.com#

@Table(name = "url_to_edit")
@NamedQueries({
        @NamedQuery(name= UrlToEdit.FIND_BY_URL, query = "select urlToEdit from UrlToEdit urlToEdit where urlToEdit.url = :url")
})
@NoArgsConstructor
public class UrlToEdit { ... }

With my current solution the table contains the follow rows:

Data in the Table: 
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... | 
| 2 | example.com/ | ... | 
| 3 | example.com/# | ... | 
| 4 | www.example.com/ | ... | 
| 5 | https://www.example.com/ | ... | 
| 6 | example.net | ... | 


How can I say in the sql that it is the same? Or need I some kind of pre parser? And is it possible to make a bulk insert? My current Code inserts one after the other.

EDIT: I have multiple urls from one host. I can’t go after the hostnames. e.g. example.com/test/ example.com/test/# and example.com/# etc.

Advertisement

Answer

I think you should transform the urls even before storing them into the database; this way, all your data would be normalized and you won’t have to check every row manually. Using a UNIQUE constraint to the url column in the table would help too.

In terms of the transformation, I think (not assured) that the following regex might work:

 Pattern URL_REGEX = Pattern.compile("(?:https?:\/\/)?(www\.)?([^\/]+).*");
 String url = "http://www.example.com/xxx";
 Matcher matcher = URG_REGEX.matcher(url);
 if (matcher.matches()) {
    url = matcher.group(2);
 } 

NOTE: I adapt the regular expression in order to fit your data, but I wouldn’t consider example.com and www.example.com to be the same URL.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement