{"id":3793,"date":"2012-11-19T09:55:03","date_gmt":"2012-11-19T13:55:03","guid":{"rendered":"http:\/\/starlightcascade.ca\/blog\/?p=3793"},"modified":"2012-11-20T10:50:11","modified_gmt":"2012-11-20T14:50:11","slug":"apache-log-analysis","status":"publish","type":"post","link":"https:\/\/starlightcascade.ca\/blog\/2012\/11\/apache-log-analysis\/","title":{"rendered":"apache log analysis"},"content":{"rendered":"<p><a href=\"http:\/\/starlightcascade.ca\/blog\/wp-content\/uploads\/2012\/11\/apache-logo.jpg\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/starlightcascade.ca\/blog\/wp-content\/uploads\/2012\/11\/apache-logo-299x224.jpg\" alt=\"\" title=\"apache-logo\" width=\"299\" height=\"224\" class=\"alignright size-medium wp-image-3799\" srcset=\"https:\/\/starlightcascade.ca\/blog\/wp-content\/uploads\/2012\/11\/apache-logo-299x224.jpg 299w, https:\/\/starlightcascade.ca\/blog\/wp-content\/uploads\/2012\/11\/apache-logo-150x112.jpg 150w, https:\/\/starlightcascade.ca\/blog\/wp-content\/uploads\/2012\/11\/apache-logo.jpg 640w\" sizes=\"auto, (max-width: 299px) 100vw, 299px\" \/><\/a>Once in awhile one wonders just how many search engines are out there.<br \/>\nI wrote up a linux bash script found on the net to analyze the apache web server logs for starlightcascade.ca<br \/>\nThe script looks like this.<\/p>\n<p>>cat listip<br \/>\n#!\/bin\/bash<br \/>\nd=$1<br \/>\nOUT=\/tmp\/spam.ip.$$<br \/>\nHTTPDLOG=&#8221;\/var\/log\/httpd\/starlightcascade-access.log&#8221;<br \/>\n[ $# -eq 0 ] &#038;&#038; { echo &#8220;Usage: $0 domain-name&#8221;; exit 999; }<br \/>\nif [ -f $HTTPDLOG ];<br \/>\nthen<br \/>\n        awk &#8216;{print}&#8217; $HTTPDLOG >$OUT<br \/>\n        awk &#8216;{ print $1}&#8217; $OUT  |  sort -n | uniq -c | sort -n<br \/>\nelse<br \/>\n        echo &#8220;$HTTPDLOG not found. Make sure domain exists and setup correctly.&#8221;<br \/>\nfi<br \/>\n\/bin\/rm -f $OUT<\/p>\n<p>When we run the script and then from those results pull out the web search engine crawlers we get a listing of the number of visits by unique IP addresses<\/p>\n<p>[] .\/listip starlightcascade.ca | grep -i crawl<br \/>\n      1 baiduspider-123-125-71-100.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-101.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-109.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-113.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-114.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-26.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-43.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-58.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-71.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-72.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-75.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-76.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-77.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-81.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-85.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-89.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-91.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-92.crawl.baidu.com<br \/>\n      1 baiduspider-123-125-71-95.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-103.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-110.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-136.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-138.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-147.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-148.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-153.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-157.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-158.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-159.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-164.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-169.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-182.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-183.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-184.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-186.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-189.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-190.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-191.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-193.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-50.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-53.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-87.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-91.crawl.baidu.com<br \/>\n      1 baiduspider-180-76-5-93.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-101.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-102.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-109.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-123.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-140.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-151.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-153.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-165.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-168.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-169.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-170.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-171.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-176.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-180.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-182.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-183.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-79.crawl.baidu.com<br \/>\n      1 baiduspider-220-181-108-82.crawl.baidu.com<br \/>\n      1 crawl-66-249-66-41.googlebot.com<br \/>\n      2 baiduspider-180-76-5-100.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-137.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-149.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-150.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-151.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-155.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-160.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-161.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-162.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-166.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-171.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-172.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-180.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-181.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-188.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-192.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-197.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-48.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-51.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-52.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-56.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-59.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-67.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-92.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-98.crawl.baidu.com<br \/>\n      2 baiduspider-180-76-5-99.crawl.baidu.com<br \/>\n      2 baiduspider-220-181-108-149.crawl.baidu.com<br \/>\n      2 baiduspider-220-181-108-162.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-111.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-142.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-154.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-165.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-167.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-173.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-177.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-178.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-185.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-54.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-55.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-57.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-60.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-61.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-62.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-66.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-89.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-94.crawl.baidu.com<br \/>\n      3 baiduspider-180-76-5-95.crawl.baidu.com<br \/>\n      4 baiduspider-180-76-5-156.crawl.baidu.com<br \/>\n      4 baiduspider-180-76-5-168.crawl.baidu.com<br \/>\n      4 baiduspider-180-76-5-88.crawl.baidu.com<br \/>\n      4 baiduspider-180-76-5-90.crawl.baidu.com<br \/>\n      4 baiduspider-180-76-5-97.crawl.baidu.com<br \/>\n      5 baiduspider-180-76-5-107.crawl.baidu.com<br \/>\n      5 baiduspider-180-76-5-144.crawl.baidu.com<br \/>\n      5 baiduspider-180-76-5-146.crawl.baidu.com<br \/>\n      5 baiduspider-180-76-5-65.crawl.baidu.com<br \/>\n      6 baiduspider-180-76-5-63.crawl.baidu.com<br \/>\n      7 baiduspider-180-76-5-101.crawl.baidu.com<br \/>\n     72 crawler.kalooga.com<br \/>\n    367 crawl-66-249-73-143.googlebot.com<br \/>\n    587 crawl-66-249-73-85.googlebot.com<\/p>\n<p>baiduspider is from China.<\/p>\n<p>In addition there are dozens > hundreds of breakin attempts via ssh and ftp every single day to my servers.  Funny how when I trace back those IP addresses, most end up in China.<br \/>\nAnd I think back to the <a href=\"http:\/\/www.cbc.ca\/news\/business\/story\/2012\/11\/16\/pol-the-house-zhang-junsai-chinese-ambassador-to-canada.html\" target=\"_blank\">Chinese Ambassador in the news this past week<\/a><\/p>\n<p>Beijing&#8217;s representative in Ottawa says Chinese firms are not involved in foreign espionage and he challenges anyone who says otherwise to produce evidence or keep quiet, in a rare interview airing Saturday on CBC Radio&#8217;s The House.<\/p>\n<p>In a word.. Bullshit buddy.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Once in awhile one wonders just how many search engines are out there. I wrote up a linux bash script found on the net to analyze the apache web server logs for starlightcascade.ca The script looks like this. >cat listip #!\/bin\/bash d=$1 OUT=\/tmp\/spam.ip.$$ HTTPDLOG=&#8221;\/var\/log\/httpd\/starlightcascade-access.log&#8221; [ $# -eq 0 ] &#038;&#038; { echo &#8220;Usage: $0 domain-name&#8221;; [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[15],"tags":[],"class_list":["post-3793","post","type-post","status-publish","format-standard","hentry","category-tech"],"_links":{"self":[{"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/posts\/3793","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/comments?post=3793"}],"version-history":[{"count":0,"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/posts\/3793\/revisions"}],"wp:attachment":[{"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/media?parent=3793"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/categories?post=3793"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/starlightcascade.ca\/blog\/wp-json\/wp\/v2\/tags?post=3793"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}