Skip to content
Advertisement

SQL / Hibernate insert only if url not exists

I have a list of urls and a table that contains urls. I want to only insert if the url is not in the table.

After the insert the Data-table should be contain:

My current attempt is to have a method findByURL(url):List and a call this method for every URL in the list. If the returned list is empty I insert the url into the table but unfortunately my statement makes a difference between example.com and example.com#

With my current solution the table contains the follow rows:

How can I say in the sql that it is the same? Or need I some kind of pre parser? And is it possible to make a bulk insert? My current Code inserts one after the other.

EDIT: I have multiple urls from one host. I can’t go after the hostnames. e.g. example.com/test/ example.com/test/# and example.com/# etc.

Advertisement

Answer

I think you should transform the urls even before storing them into the database; this way, all your data would be normalized and you won’t have to check every row manually. Using a UNIQUE constraint to the url column in the table would help too.

In terms of the transformation, I think (not assured) that the following regex might work:

NOTE: I adapt the regular expression in order to fit your data, but I wouldn’t consider example.com and www.example.com to be the same URL.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement