Website anomaly monitoring and handling

When doing SEO, we often check the website's log files to perform analysis and make judgments. However, if you control hundreds or thousands of sites, would you still look at each log file one by one?

When dealing with a large-scale network of websites, we generally monitor them by levels based on the importance of the websites. For example, for important resource sites, we may analyze and maintain them as if they were main sites, while for some sites, we may only analyze and maintain them when issues arise. Some sites may be left to thrive or perish, depending on the strategy of the network. There is no universal approach to this.

Typically, I conduct anomaly monitoring on the network of websites, that is, manual analysis and processing when anomalies are detected, and I rarely check them on a regular basis.

Definition of anomaly scenarios

When monitoring the operation of websites, we need to define what scenarios are considered anomalies. Generally, the anomalies I personally define mainly include the following 5 types:

  1. Abnormal spider visit frequency: such as sudden devaluation leading to spiders no longer visiting, or the occurrence of frequent spider visits after devaluation.
  2. Website Traffic Anomaly: Generally, there should not be significant fluctuations in the traffic of the station group. If there are fluctuations, it may be due to someone collecting or attacking the website.
  3. 404 Anomaly: Refers to the page not found anomaly, which needs to be dealt with promptly.
  4. Special Page Traffic Anomaly: An anomaly in the traffic of important pages, such as Taobao redirection pages, can be observed by comparing traffic and conversion rates to understand the traffic sources.
  5. Special Term Traffic Anomaly: If market search volume and description click-through rate remain constant, special term traffic reflects keyword ranking situations.

Monitoring Methods

To monitor the above anomalies, we can create a data table for each metric, using ABCDE to represent them. Then establish an automated task to save the daily data of each website to the database.

Under IIS, it is recommended to use the Logparser tool provided by Microsoft, which can process logs using SQL queries. The specific parameters can be found by searching the search engine.

Specific Usage Method

Taking the first anomaly as an example, you can monitor spider visits using the following command:

Logparser -i:iisw3c "+"Select count(0) as hits Into A from xxx.log where cs(User-Agent) like '%spider%'" -o:SQL -server: Server IP -driver:"SQL Server" -database: Database Name -username: sa -password: ***

Exception Handling

When preprocessing, compare the data of the current day and the previous day to obtain the difference. Set a threshold, exceeding the threshold is considered an exception. For example, traffic anomalies can be judged by percentage, with over 30% being considered an anomaly; 404 errors can be directly judged by subtraction.

I use a C# program to handle exceptions, for example, by comparing the latest 404 data to detect anomalies. When an exception occurs, the program will notify via email for timely handling.

Other Suggestions

In addition to the above methods, you can also use Logparser to split logs, and then send them to a specified FTP address via FTP commands, so you can directly use the data without manual processing each time.

Overall, monitoring website anomalies is an important means to maintain the security and stable operation of the website. Timely detection and handling of anomalies can ensure the normal operation of the website and user experience.