Changkun's Blog欧长坤的博客

Science and art, life in between.科学与艺术,生活在其间。

  • Home首页
  • Ideas想法
  • Posts文章
  • Tags标签
  • Bio关于
  • TOC目录
  • Overview概览
Changkun Ou

Changkun Ou

Human-AI interaction researcher, engineer, and writer.人机交互研究者、工程师、写作者。

Bridging HCI, AI, and systems programming. Building intelligent human-in-the-loop optimization systems. Informed by psychology, sociology, cognitive science, and philosophy.连接人机交互、AI 与系统编程。构建智能的人在环优化系统。融合心理学、社会学、认知科学与哲学。

Science and art, life in between.科学与艺术,生活在其间。

276 Blogs博客
165 Tags标签
Changkun's Blog欧长坤的博客

简单的日志分析

Published at发布于:: 2014-10-29   |   Reading阅读:: 1 min   |   PV/UV: /

今天听P哥说写了个爬虫抓我网站玩,所以就比较好奇分析了一下P哥来抓我网站的一个行为。

其实听着分析很高端的样子,本来我打算用python写个脚本的,后来一想干脆就用awk算了,也就简单分析一下,等以后有时间部署个分析平台。

这是日志格式:

1
2
3
4
5
6
218.5.46.14 - - [26/Jun/2013:19:21:28 +0800] "GET / HTTP/1.1" 200 1115 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1" -
218.5.46.14 - - [26/Jun/2013:19:21:29 +0800] "GET /favicon.ico HTTP/1.1" 404 564 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1" -
218.5.46.14 - - [26/Jun/2013:19:21:45 +0800] "-" 400 0 "-" "-" -
220.181.126.47 - - [26/Jun/2013:19:29:02 +0800] "GET / HTTP/1.1" 200 2197 "http://koudaic.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Tablet PC 2.0)" -
220.181.132.199 - - [26/Jun/2013:19:31:57 +0800] "GET / HTTP/1.1" 200 1115 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) DumpRenderTree/0.0.0.0 Safari/536.11" -
220.181.132.18 - - [26/Jun/2013:19:32:11 +0800] "GET / HTTP/1.1" 200 1115 "-" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.11 (KHTML, like Gecko) DumpRenderTree/0.0.0.0 Safari/536.11" -

简单的分析直接用grep, wc, tr, sort, awk这些就行了,awk可以默认使用空格分隔文本并输入,或者用-F “***“来执行分隔符,然后我们可以sort并用uniq统计出现的次数,最后再sort看看哪个ip的访问次数最多就行了,我们没必要对整个日志进行分析,可以先使用grep过滤。

1
cat access.log | awk '{print $1}' | sort -n | uniq -c | sort -r -n
#Python#
  • Author:作者: Changkun Ou
  • Link:链接: https://changkun.de/blog/posts/simple-logging-analysis/
  • All articles in this blog are licensed under本博客所有文章均采用 CC BY-NC-ND 4.0 unless stating additionally.许可协议,除非另有声明。
谈谈 CV
图书馆真是越来越无趣了

Have thoughts on this?有想法?

I'd love to hear from you — questions, corrections, disagreements, or anything else.欢迎来信交流——问题、勘误、不同看法,或任何想说的。

hi@changkun.de
© 2008 - 2026 Changkun Ou. All rights reserved.保留所有权利。 | PV/UV: /
0%