Blog

Piwik is an alternative to Google Analytics

Posted by on May 5, 2017 in Choose Privacy Week, data mining, libraries, Privacy Awareness, Privacy vs. Library 2.0, Protecting Privacy, student data privacvy, Vendor Privacy | 0 comments

By Adam Chandler
Cornell University Libraries
Director, Automation, User Experience, and Post-Cataloging Services

Like many other libraries, Cornell University Library uses Google Analytics (GA) to track website usage. GA, designed to support Google’s primary revenue stream, advertising, has many strengths, especially the fact that it does not cost money. However, given our tradition in libraries to protect reader privacy, a compelling argument can be made that Google Analytics is inappropriate for libraries. After a review of alternatives to GA following Edward Snowden’s revelations, we selected Piwik (piwik.org) as a replacement for GA. Piwik is free, open source, and perhaps most importantly, it supports local data collection. In this brief blog post, I will summarize what some in the library literature say about web analytics tools, explain why we selected Piwik, and describe what is involved when migrating from GA to Piwik.

This blog post is an abridged version of a much longer article I co-authored with Melissa Wallace.1 In researching that article, we found recommendations to use Google Analytics written by librarians in every year back to 2007. In reading through the librarian-authored articles advocating for the use of GA, clearly librarians like it, but what is odd is the extent to which the authors are disconnected from the reader privacy tradition in libraries. There is occasional mention of privacy as a consideration, but not enough to change the recommendation to use Google Analytics. The most explicit statement against the use of GA in libraries we found is a blog post published by the Ontario Library Association written by Susanna Galbraith. Galbraith writes:

Many of us in the library community who have a responsibility to assess the usage of our library’s websites have become very familiar with the popular Google Analytics. Google Analytics is free and robust, and yet the data it collects belongs to Google and is housed on U.S. servers, where data may be subject to the legislation of that country. While many may see this as inconsequential (hey, Canada.ca uses Google Analytics, why can’t we?), those of us in the library community who wish to uphold the longstanding tradition in our profession of protecting user privacy, may wish to seek other alternatives.2

We agree with Galbraith. For privacy-related reasons alone, Piwik is a better web analytics solution for libraries. It is also a powerful open source web analytics tool, feature for feature, on par with GA. The table below is a high level summary of the two products.

Functionality Piwik Google Analytics
Data storage Library controlled server Google controlled server
Data may be collected by Javascript widget embedded on page yes yes
Data may be collected ingesting Apache log files yes no
Command line SQL access to database yes no
Aggregate IP addresses to location-based groups defined by library yes no
Management of logins Centralized Decentralized
API yes yes
Real-time data yes yes
Event tracking yes yes
Segment or filter data yes yes
Customizable dashboard yes yes
E-commerce support yes yes
Goal conversion tracking yes yes
Search keywords yes yes
Geolocation yes yes
Heat mapping yes yes
Reporting features (email, export, etc.) yes yes
IP and URL exclusion yes yes
Plugins/CMS integration yes yes

 

Piwik installation was relatively simple, with library systems administrator following the steps outlined in Piwik’s online documentation. It is hosted on a Cornell University server. The university’s standard security profile is in place, with periodic scans and monitoring by Cornell central IT. We chose a user-friendly, product-agnostic URL (webanalytics.library.cornell.edu), at which the installation could be completed through an easy point-and-click process. In addition to the default installation, we set up a recommended automated cron task to process reports periodically; without this task the system would recalculate statistics on the fly and would be considerably slower. Last, we used Piwik’s log import script to parse our Apache logs. This process was also straightforward, and once configured, it runs automatically and does not require much day-to-day maintenance.

In addition to data collection by Apache logs, CUL also collects web statistics via Javascript. While Javascript embed code must be manually added to websites, it allows for greater customization and additional features, such as a real-time map of visitors and the tracking of exit links. The Javascript option also allows us to collect statistics on sites that are hosted by third parties, such as Illiad and 360 Link.

We would be remiss if we failed to acknowledge that not every institution has the IT resources of a library like Cornell. Before Piwik can see widespread adoption across libraries, IT support is a gap that might need to be filled by a privacy-sensitive non-profit.


References

1Adam Chandler and Melissa Wallace, “Using Piwik Instead of Google Analytics at the Cornell University Library,” The Serials Librarian 71, no. 3–4 (November 16, 2016): 173–79, doi:10.1080/0361526X.2016.1245645.

2Susanna Galbraith, “Piwik: Breaking Away from Google Analytics,” Open Shelf, http://www.open-shelf.ca/160215-piwik/ (accessed February 15, 2016).

Choosing Privacy for Public Computers in Libraries

Posted by on May 4, 2017 in Choose Privacy Week, Encryption, libraries, Privacy and Public Computers, Privacy Awareness, Protecting Privacy, reader privacy | 0 comments

By Matt Beckstrom
Systems Manager Librarian, Lewis & Clark Library, Helena, MT
Author, Protecting Patron Privacy:Safe Practices for Public Computers

Most of us offer some kind of public computers for our patrons, and obviously privacy is a concern. What should we be doing for our patrons when it comes to privacy on public computers? What steps can we take?

First of all, we have to remember that we have to work around the fact that privacy is difficult. Especially when we introduce the variable of the patrons. No matter what we do, their behavior on the computer can expose them in ways we cannot stop. If they enter their personal information on a site, or sign up for something, they are giving up their privacy. It is also possible that by over-configuring a computer for privacy, we can prevent the user from being able to access or utilize some resources. There must be a middle ground where we provide a certain level of protection, but not enough to block useful resources.

Anyone who uses a computer is leaving traces of their use all over the computer. Browsing history, lists of files that were used, and application usage are collected and stored on the computer. The most efficient way to deal with the storage of this information, is a PC clearing application like Deep Freeze. These applications will restore the computer to a ‘clean’ state when the computer is rebooted. Any use of the computer between reboots is removed. Instead of a full reboot, other applications can be used to wipe patron use from the computer. CCleaner, a popular file and registry cleaning application, can be scheduled to run at certain times of the day using the Microsoft scheduled task manager. CCleaner will run through the computer and delete any temporary files, or browser traces that are on the computer. Many applications can also be configured to either not store any information, or to remove it when the user is finished. Most Internet browsers can be configured to go into a private browsing mode that will not store any temporary files, or keep a history of the sites visited. Internet Explorer and Microsoft Edge use a mode called InPrivate browsing mode, Chrome uses Incognito mode, and Firefox has private browsing. Applications like Microsoft Word and Excel can be configured to not show a list of the most recently used documents.

One of the biggest concerns with public computers is spyware or viruses that get on the computers. There are many types of spyware applications that will, once they are installed on the computer, begin to collect information about the person using the computer, and transmitting it back to someone on the Internet. Many of these applications can be blocked by making sure your public computers have up-to-date anti-virus software installed. When choosing an anti-virus application, make sure that blocks viruses and spyware. Many will also come with firewall applications built in. These will sometimes interfere with the Microsoft firewall and anti-virus that is installed on the computer by default. It is possible to over-protect a computer by having too many anti-virus or firewall applications installed. Make sure you only have one of each running.

Of course, the biggest concern with privacy on public computers is Internet browsing. With the recent change in Congress regarding the rights that ISP’s have to collect and use private Internet behavior of their customers, the ability to browse the Internet privately and securely is more important.

There are many ways to protect user privacy on the Internet. The quickest would be to install browser plugins that attempt to block advertising and tracking information on pages. Plugins like Privacy Badger from the EFF or Disconnect.me do a decent job of blocking tracking cookies or HTML tags. Other plugins like HTTPS Everywhere from the EFF will attempt to force Internet connections to use SSL (Secure Sockets Layer)  in order to prevent the transmission from being intercepted and read. VPNs (Virtual Private Networks) are a great way to protect Internet connections. Installing a VPN on public computers will make the traffic coming from them appear to be coming from somewhere else, and all that traffic is stripped of any identifying information about who is using it, and what they are doing. A VPN subscription will cost some money, but they provide a powerful solution to privacy. The TOR browser is free, and will provide a very high level private browsing experience, but it does come with its downfalls. The TOR browser will encrypt the Internet traffic and route it through nodes on the Internet, each providing their own layer of privacy. These multiple layers of encryption and routing will slow down the connection, and make some types of traffic unusable like video.

While we cannot guarantee privacy on public computers, we can offer our patrons an experience that gives them more protection than they would be getting elsewhere. Using a combination of PC configuration and browser configuration or plugins will go a long way in providing privacy. Of course, do not forget to educate your users on how they can be safe on the Internet!

A Toolkit to Audit Your Library’s Privacy Practices

Posted by on May 4, 2017 in Choose Privacy Week, libraries, Privacy Awareness, Privacy Policies, Protecting Privacy, reader privacy, Vendor Privacy | 0 comments

by Sarah Houghton
Director, San Rafael Public Library (California)

In this brave new world do you find yourself wondering how to ensure that your library is protecting your users’ privacy to the best of your ability? Not sure where to start? Check out the Library Patron Privacy Checklists, a joint effort from LITA’s Patron Privacy Interest Group and the ALA Intellectual Freedom Committee’s Privacy Subcommittee.

No matter what kind of library you work in, how big it is, or how much control you have over your IT infrastructure, these checklists can help you conduct a comprehensive audit of library user data collection, retention, submission, and security. This set of seven checklists will help your library take practical steps to implement the principles that are laid out in the ALA Library Privacy Guidelines.

Better yet, each checklist is organized into three priority groups. Priority 1 are actions that hopefully all libraries can take to improve privacy practices. Priority 2 and Priority 3 actions can be achieved by most libraries, but may depend on your organizational structure, control over infrastructure, technical expertise of staff, and resources.

The checklists cover:

You’ll find simple and practical tips like destroying any documents with user data on them, making sure your library actually has a privacy policy, and changing default passwords. But you’ll also find very technical and specific guidelines like encrypting data communications between client and server applications, specific terms to look for in license agreements, and installing plugins on public computers to limit third party tracking.

Better yet, each checklist includes a list of resources to help you achieve each goal if you need some help getting pointed in the right direction. Basically, it’s privacy best practices in a box!