I am looking to fetch all requests to xmlrpc.php and wp-login.php and have them wildcarded in the statement.
But this presents an issue, as it doesn’t output the data in only two rows for xmlrpc and wp-login, but also includes the URLs with a query attached. Hoping to have it include every URL requested, but combine them to show as just xmlrpc.php or wp-login.php
I am a mysql n00b and was playing around with a substr replace and group_concat but couldn’t get it to work.
WITH 
  subq AS (
    SELECT url, COUNT(url) AS count
    FROM `flywheel-production.fastly_logs.ingress_logs`
    WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) 
  AND (url LIKE "/wp-login.php%" OR  url LIKE "/xmlrpc.php%")
  AND site_hash = "btmpuroizf"
    GROUP BY url
  )
SELECT 
  url,
  count,
  ROUND(count / (SELECT SUM(count) FROM subq) * 100, 2) AS percent
FROM subq
ORDER BY count DESC
Any help would be much appreciated. Thanks!
Advertisement
Answer
For BigQuery Standard SQL
Below adjusted query should do the “trick”
#standardSQL WITH subq AS ( SELECT REGEXP_EXTRACT(url, r'(.*?)(?:?|$)') url, COUNT(url) AS COUNT FROM `flywheel-production.fastly_logs.ingress_logs` WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY) AND (url LIKE "/wp-login.php%" OR url LIKE "/xmlrpc.php%") AND site_hash = "btmpuroizf" GROUP BY url ) SELECT url, COUNT, ROUND(COUNT / (SELECT SUM(COUNT) FROM subq) * 100, 2) AS percent FROM subq ORDER BY COUNT DESC