apache log analysis
Once in awhile one wonders just how many search engines are out there.
I wrote up a linux bash script found on the net to analyze the apache web server logs for starlightcascade.ca
The script looks like this.
>cat listip
#!/bin/bash
d=$1
OUT=/tmp/spam.ip.$$
HTTPDLOG=”/var/log/httpd/starlightcascade-access.log”
[ $# -eq 0 ] && { echo “Usage: $0 domain-name”; exit 999; }
if [ -f $HTTPDLOG ];
then
awk ‘{print}’ $HTTPDLOG >$OUT
awk ‘{ print $1}’ $OUT | sort -n | uniq -c | sort -n
else
echo “$HTTPDLOG not found. Make sure domain exists and setup correctly.”
fi
/bin/rm -f $OUT
When we run the script and then from those results pull out the web search engine crawlers we get a listing of the number of visits by unique IP addresses
[] ./listip starlightcascade.ca | grep -i crawl
1 baiduspider-123-125-71-100.crawl.baidu.com
1 baiduspider-123-125-71-101.crawl.baidu.com
1 baiduspider-123-125-71-109.crawl.baidu.com
1 baiduspider-123-125-71-113.crawl.baidu.com
1 baiduspider-123-125-71-114.crawl.baidu.com
1 baiduspider-123-125-71-26.crawl.baidu.com
1 baiduspider-123-125-71-43.crawl.baidu.com
1 baiduspider-123-125-71-58.crawl.baidu.com
1 baiduspider-123-125-71-71.crawl.baidu.com
1 baiduspider-123-125-71-72.crawl.baidu.com
1 baiduspider-123-125-71-75.crawl.baidu.com
1 baiduspider-123-125-71-76.crawl.baidu.com
1 baiduspider-123-125-71-77.crawl.baidu.com
1 baiduspider-123-125-71-81.crawl.baidu.com
1 baiduspider-123-125-71-85.crawl.baidu.com
1 baiduspider-123-125-71-89.crawl.baidu.com
1 baiduspider-123-125-71-91.crawl.baidu.com
1 baiduspider-123-125-71-92.crawl.baidu.com
1 baiduspider-123-125-71-95.crawl.baidu.com
1 baiduspider-180-76-5-103.crawl.baidu.com
1 baiduspider-180-76-5-110.crawl.baidu.com
1 baiduspider-180-76-5-136.crawl.baidu.com
1 baiduspider-180-76-5-138.crawl.baidu.com
1 baiduspider-180-76-5-147.crawl.baidu.com
1 baiduspider-180-76-5-148.crawl.baidu.com
1 baiduspider-180-76-5-153.crawl.baidu.com
1 baiduspider-180-76-5-157.crawl.baidu.com
1 baiduspider-180-76-5-158.crawl.baidu.com
1 baiduspider-180-76-5-159.crawl.baidu.com
1 baiduspider-180-76-5-164.crawl.baidu.com
1 baiduspider-180-76-5-169.crawl.baidu.com
1 baiduspider-180-76-5-182.crawl.baidu.com
1 baiduspider-180-76-5-183.crawl.baidu.com
1 baiduspider-180-76-5-184.crawl.baidu.com
1 baiduspider-180-76-5-186.crawl.baidu.com
1 baiduspider-180-76-5-189.crawl.baidu.com
1 baiduspider-180-76-5-190.crawl.baidu.com
1 baiduspider-180-76-5-191.crawl.baidu.com
1 baiduspider-180-76-5-193.crawl.baidu.com
1 baiduspider-180-76-5-50.crawl.baidu.com
1 baiduspider-180-76-5-53.crawl.baidu.com
1 baiduspider-180-76-5-87.crawl.baidu.com
1 baiduspider-180-76-5-91.crawl.baidu.com
1 baiduspider-180-76-5-93.crawl.baidu.com
1 baiduspider-220-181-108-101.crawl.baidu.com
1 baiduspider-220-181-108-102.crawl.baidu.com
1 baiduspider-220-181-108-109.crawl.baidu.com
1 baiduspider-220-181-108-123.crawl.baidu.com
1 baiduspider-220-181-108-140.crawl.baidu.com
1 baiduspider-220-181-108-151.crawl.baidu.com
1 baiduspider-220-181-108-153.crawl.baidu.com
1 baiduspider-220-181-108-165.crawl.baidu.com
1 baiduspider-220-181-108-168.crawl.baidu.com
1 baiduspider-220-181-108-169.crawl.baidu.com
1 baiduspider-220-181-108-170.crawl.baidu.com
1 baiduspider-220-181-108-171.crawl.baidu.com
1 baiduspider-220-181-108-176.crawl.baidu.com
1 baiduspider-220-181-108-180.crawl.baidu.com
1 baiduspider-220-181-108-182.crawl.baidu.com
1 baiduspider-220-181-108-183.crawl.baidu.com
1 baiduspider-220-181-108-79.crawl.baidu.com
1 baiduspider-220-181-108-82.crawl.baidu.com
1 crawl-66-249-66-41.googlebot.com
2 baiduspider-180-76-5-100.crawl.baidu.com
2 baiduspider-180-76-5-137.crawl.baidu.com
2 baiduspider-180-76-5-149.crawl.baidu.com
2 baiduspider-180-76-5-150.crawl.baidu.com
2 baiduspider-180-76-5-151.crawl.baidu.com
2 baiduspider-180-76-5-155.crawl.baidu.com
2 baiduspider-180-76-5-160.crawl.baidu.com
2 baiduspider-180-76-5-161.crawl.baidu.com
2 baiduspider-180-76-5-162.crawl.baidu.com
2 baiduspider-180-76-5-166.crawl.baidu.com
2 baiduspider-180-76-5-171.crawl.baidu.com
2 baiduspider-180-76-5-172.crawl.baidu.com
2 baiduspider-180-76-5-180.crawl.baidu.com
2 baiduspider-180-76-5-181.crawl.baidu.com
2 baiduspider-180-76-5-188.crawl.baidu.com
2 baiduspider-180-76-5-192.crawl.baidu.com
2 baiduspider-180-76-5-197.crawl.baidu.com
2 baiduspider-180-76-5-48.crawl.baidu.com
2 baiduspider-180-76-5-51.crawl.baidu.com
2 baiduspider-180-76-5-52.crawl.baidu.com
2 baiduspider-180-76-5-56.crawl.baidu.com
2 baiduspider-180-76-5-59.crawl.baidu.com
2 baiduspider-180-76-5-67.crawl.baidu.com
2 baiduspider-180-76-5-92.crawl.baidu.com
2 baiduspider-180-76-5-98.crawl.baidu.com
2 baiduspider-180-76-5-99.crawl.baidu.com
2 baiduspider-220-181-108-149.crawl.baidu.com
2 baiduspider-220-181-108-162.crawl.baidu.com
3 baiduspider-180-76-5-111.crawl.baidu.com
3 baiduspider-180-76-5-142.crawl.baidu.com
3 baiduspider-180-76-5-154.crawl.baidu.com
3 baiduspider-180-76-5-165.crawl.baidu.com
3 baiduspider-180-76-5-167.crawl.baidu.com
3 baiduspider-180-76-5-173.crawl.baidu.com
3 baiduspider-180-76-5-177.crawl.baidu.com
3 baiduspider-180-76-5-178.crawl.baidu.com
3 baiduspider-180-76-5-185.crawl.baidu.com
3 baiduspider-180-76-5-54.crawl.baidu.com
3 baiduspider-180-76-5-55.crawl.baidu.com
3 baiduspider-180-76-5-57.crawl.baidu.com
3 baiduspider-180-76-5-60.crawl.baidu.com
3 baiduspider-180-76-5-61.crawl.baidu.com
3 baiduspider-180-76-5-62.crawl.baidu.com
3 baiduspider-180-76-5-66.crawl.baidu.com
3 baiduspider-180-76-5-89.crawl.baidu.com
3 baiduspider-180-76-5-94.crawl.baidu.com
3 baiduspider-180-76-5-95.crawl.baidu.com
4 baiduspider-180-76-5-156.crawl.baidu.com
4 baiduspider-180-76-5-168.crawl.baidu.com
4 baiduspider-180-76-5-88.crawl.baidu.com
4 baiduspider-180-76-5-90.crawl.baidu.com
4 baiduspider-180-76-5-97.crawl.baidu.com
5 baiduspider-180-76-5-107.crawl.baidu.com
5 baiduspider-180-76-5-144.crawl.baidu.com
5 baiduspider-180-76-5-146.crawl.baidu.com
5 baiduspider-180-76-5-65.crawl.baidu.com
6 baiduspider-180-76-5-63.crawl.baidu.com
7 baiduspider-180-76-5-101.crawl.baidu.com
72 crawler.kalooga.com
367 crawl-66-249-73-143.googlebot.com
587 crawl-66-249-73-85.googlebot.com
baiduspider is from China.
In addition there are dozens > hundreds of breakin attempts via ssh and ftp every single day to my servers. Funny how when I trace back those IP addresses, most end up in China.
And I think back to the Chinese Ambassador in the news this past week
Beijing’s representative in Ottawa says Chinese firms are not involved in foreign espionage and he challenges anyone who says otherwise to produce evidence or keep quiet, in a rare interview airing Saturday on CBC Radio’s The House.
In a word.. Bullshit buddy.