Understanding web log statistics and media metrics
By Craig Goldwyn, visibility.tv
Your server gathers a lot of info about how it is being used. Most of it is useless. There are three stats that are very informative: "Page views" and "referrers" and "search string." Pay close attention to them.
|
Media buyers need to know how many people will see their ads. Television has its ubiquuitous Nielsen rating service. But on the internet, there are many different ways to measure, and more will surely evolve with time. OIn the internet you get much more accurate info, but it is still not accurate enough.
Think of your ad as a store. TV ratings tell you how many people enter and exit your store. But internet metrics can tell you what shelves they lingered before, what they picked up, how long they were in the store, where they were before they entered your store and where they went after leaving your store. All very useful if you know how to analyze the data.
"Hits stands for 'How Idiots Track Success'" Avinash Kaushik, Director of Web Research & Analytics, Intuit
Let's cut to the chase. Hits are meaningless. That's right, you can ignore anyone who brags they are getting gazillions of hits. Why? Because they are like counting snowflakes when it is inches of snow that matters. Website marketers that quote usage stats in their promotions are looking in a rear view mirror where objects usually appear larger than they really are. Remember the old line that there are lies, damn lies, and statistics? This was never truer than when it comes to web stats. On the other hand, your stats are very useful to see how usage is growing or shrinking, what promotions are working, how people find you, and what pages are the most popular. So what matters when measuring usage? The most important stats are pages views, referrer, and search string. Most of the other stats are meaningless to the end user.
Confused? Here's what you need to know about usage stats.
Servers and clients. Your website lives on a computer called a server. There is special server software on the computer that responds to requests from people when they click on a link to something on your site. Web browser software is also called client software, and it talks to your server software. So what you have is servers and clients talking to each other.
Logs and analysis software. Most server software records a lot of info about all client requests in a text file called a log. Logs contain a lot of info about what people are doing on your site and where they come from. But most of the info it collects is of little use to you. In order to understand these statistics, naturally you need to slog through some jargon. There are a number of programs that can interpret these logs. They produce cool color graphs that can help you understand your stats.
Hits or requests. Here's why hits don't matter. Every time a browser calls for a page it requests the page (one hit) and all the graphics on the page (one hit each). Let me repeat that. THE DELIVERY OF THE PAGE CONSTITUTES ONE HIT, AND EVERY GRAPHIC ON THE PAGE IS ALSO A HIT. So a page that is all text will count only one hit, while another page with 20 graphics, including buttons on your navigation bar, will log 20 hits! Hits are useful in studying how well your server is responding to the load placed upon it by visitors to the site, but it is MEANINGLESS in understanding how much usage your site is getting. That's why pages are more useful than hits from a business standpoint.
Pages or Page Views. The definition of a page can vary from server to server depending on how it is set up. Typically anything with the extension htm, html, cgi, phtml, php3, and asp are defined as pages. Page views tell you how many page impressions you got. This is the closest measure you will get to how much usage your site is getting.
CSS and js suffixes. Many webmasters use Cascading Style Sheets (filename.css) and javascripts (filename.js) to control the look and performance of the pages on a website. They can control the color of the type, the background color, the color of links, the fonts, rollovers, and other look and feel issues. Almost every page I create is linked to a style sheet. So when a client calls for a page, the server logs the page AND the style sheet. So the ACTUAL page views is the page view total minus the css and js totals.
Referrers. Your server logs the URLs that lead a user to your site or caused a user's browser to request something from your server. If there is a link to you on a page, the referrer report can be very helpful by telling you who has links to you and how much traffic they send. But if someone comes to you by using a bookmark, log may not see a referrer. That can explain why a referrer might be a porn site! The vast majority of requests from your server are made from your own pages, since most html pages contain links to other pages and objects such as graphics files. Also, referrer reports cannot show people who use you as a home page or if they come to you from a page within AOL because most AOL pages are not made with html, but a proprietary program called Rainman.
Search strings. If someone came to your site by entering words into a search engine such as Google or Yahoo!, the server will log the search string of words they used. This can be VERY useful. For example, on a cooking website the owner was surprised to discover that people were coming to his site aftersearching on the word Weber because several of his articles refered to Weber kettle grills. This info can be very useful in selecting keywords for metatags, site descriptions, and online advertising. For more on search engines, click here.
Sites, IP addresses, and users. Every computer has a unique Internet Protocol (IP) address when it is online. Some computers have a permanent IP address, and others are assigned a temporary IP address each time they logon. IP addresses are assigned by internet access providers such as AOL, MSN, ATT, etc. The name of the computer or the access provider can be looked up automatically and that name is called a site. The sites statistic in your log shows how many unique IP addresses made requests to the server. This DOES NOT mean the number of unique individual users (aka real people). That is because some homes and offices share an IP address, and if someone uses dialup via AOL and is assigned an IP address, then goes to lunch and logs off, comes back and logs on again, he is assigned a different IP address, and the log cannot tell it is the same person. Because many of AOL's IP addresses are in Vienna, Virginia, it is impossible for you to tell how many people in Atlanta use your site. Also, when an AOL user signs off, and her IP address is reassigned to another user, your server only sees this is one IP address, but it is really two people. Likewise, if Dad logs on at one IP address, and then Mom logs on at the same address later, the server sees only one user.
Unique visitors. In order to get a more accurate count of visitors, some sites require you to sign in, but that erects a barrier that many people chose not to scale. Another technique is planting a cookie. A cookie is a small bit of code that can be places on the client's browser. When the browser calls for a page, the server can check for the cookie and record it. Even cookies can be inaccurate because often several people use the same computer, perhaps in a library or an office or a home, some people tell their browser to refuse cookies, and others remove their cookies regularly. This method is getting more accurate, and more and more advertisers are payingattention to this stat.
Countries. Not very useful nowadays. This is determined by the top level domain of the service provider of the user. It was once useful, but nowadays, people using the .TV domain could come from anywhere, not just the island of Tavola, and the most common domains, .COM (US Commercial), .NET (Network), .ORG (Non-profit Organization) and .EDU (Educational) can come from anywhere.
How bots, crawlers, and even you can screw up the logs. Sites and referrers may contain the words bot or crawler. These are programs that crawl the web looking for pages to add to their lists. These are not real people, and some of them visit a site so often they can really mislead your stat analysis if your site is not being visited by real people often. Likewise, your log software is probably counting your own visits to your site as well as your employees. Some log analysis software can be told to skip counting visits from selected IP addresses, but this feature only helps if you have a permanent IP address. Most users do not.
Visits. When a server is setup, one can set a feature called the "visit timeout." This is the amount of time between requests from the same IP address necessary to count as a separate visit. In other words, if the timeout is 30 minutes, a typical setting, then if my IP address requests a file from your site at 9 a.m. and another at 9:15 a.m., it is considered the same visit. If the request comes at 9:31, it is counted as a second visit. But you cannot user server stats to calculate how long someone is on a site. Let's say I go to your site at noon and read a few pages. Then at 12:05, I go out to lunch whil my browser is still on your page. Then I come back at 12:25 because I only have a 20 minute break. When I come back, I click on a link on your page and the server software records the time. I have not really spent 20 minutes on your site, have I?
Entry and exit pages. Whenever a visit is triggered or ended, the server logs the page the visitor entered or exited. This can yield some useful info, but you really need to think carefully about what it means.
Files. Most requests made to the server require that the server send a page or graphic to the client. It is called a file. Hits and files can be thought of as requests and responses. The numbers are rarely identical because some requests ask the server to do something other than send a file such as sorting a database. In any case, files are, like hits, not a useful measure unless you are analyzing how well your server is responding to the load.
Kbytes or kilobytes or KBs. The KB value shows the amount of data, in kilobytes (1024 bytes), that was sent out by the server. Some hosting services bill you for the number of KBs they have to send. Otherwise this number is reltively meaningless to the end user.
How caches screw things up. Although page views are the most accurate measurement, it can also be misleading. Your browser will often store a copy of a page in its cache on your hard drive so that when you request the page a second time, the browser will simply fetch it from your hard drive and it will not have to wait for it to come across the net. Of course if a page is being served from your hard drive, your webserver can't count it as a page view. To make matters worse, AOL and many universities and corporations with big networks have network caches that can hold thousands of pages. For more on caches, click here.
Failed requests. This can happen if someone clicks away from a page before the entire page is in their browser.
Monthly report. Shows monthly stats.
Daily summary. Shows what days you are getting the most usage. This number can reveal usage spurts such as when you send out an email newsletter. It can also show spikes in usage if you are featured in the newspaper or on a website like Yahoo.
Hourly summary. Shows during what hour you get your most usage. This is the hour of the clock on your server, so a server in NY will show different hourly usage than a server in SF.
User agent or browser summary. A typical listing might look like this: "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98)." Mozilla is a standard to which most browsers adhere. The version of the standard is 4.0. The actual browser is MSIE (Microsoft Internet Explorer) version 6.0 and the operating system being used in Windows 98. If the browser is listed as "Mozilla/3.01 (compatible)" it is almost always Netscape. That is because Netscape came first and it's underpinnings were named Mozilla.
Tools
Your hosting service should provide you with raw unprocessed logs at a bare minimum. For a slight fee, they should provide access to analysis software that organizes the info into useful tables and charts. If you need more analysis, here are some options.
Google Analytics. New. Free. Cool. Waiting list. Get on it. There's even a blog devoted to it at http://analytics.blogspot.com
ClickTracks. A highly system capable of great insight.
Ratings services
If you are thinking of buying ads, here are the companies that provide the metrics you need to make decisions.
comScore MediaMetrics. Logs every move of 120,000 web surfers.
Nielsen//NetRatings. Also logs the habits of thousands of web users.
Hitwise. Gathers its stats from internet service providers. Many think this is the best method.
Microsoft adLabs. A prototype service attempts to predict the gender and age of the users of any url you enter.
References
Here are some excellent blogs on the subject.
Occam's Razor. A good blog on the subject of understanding web stats and ratings by Avinash Kaushik of Intuit.
This Just In. Justin Cutroni excellent blog on the subject.
Lunametrics blog. Increasing your website’s conversion rate by Robbin Steif.