Detecting Abuse in Customer Service Applications

Customer service applications are those that provide administrative access to a system in order to deal with support, abuse, or safety concerns within a system. They are most common with things adjacent to social media and content management, but may exist for nearly all systems. This post is really about those that deal with user-submitted content in some form or another where the subjects aren’t also employees.

I’m going to try and explain four maturity levels that customer service applications go through. These practices are implemented at some or all of multiple government agencies and commercial companies. Part of the reason for writing this is that it is difficult to find any discussion of best practices or even some practices by any organizations. This is just a first pass, and hopefully I’ll get back to writing more about this specific topic.

General Guidelines#

There are some general guidelines that should be followed:

Customer service systems should exist for specifically that purpose. As such, they should not become a dumping ground for a multitude of non-customer service components.
Access to these systems must be reviewed regularly, and that review should be based on a strict need-to-know.

These are just general best practices, but you would be surprised at the number of systems that accrete substantial additional functionality over time that gets used in ways that was never expected.

The Levels#

Level 1: All or Nothing#

At this level, anyone with access to the system has access to everyone’s data with minimal restrictions. There may, or may not, be some controls around special sensitive data¹, and generally there is only some form of application logging (e.g., web server logs), but not any form of real audit-trail.

Level 2: Encouragement#

At this point, the system begins to encourage agents to behave properly. It’s built around four ideas: data masking, intentional disclosure, match only, and visible auditing.

Data Masking — Data that isn’t shown isn’t disclosed. By masking/hiding data that is sensitive, we reduce the casual disclosure of data.
Intentional Disclosure — Require some kind of intentional behavior to view sensitive information.
Match Only — Don’t reveal information, but allow a mechanism to confirm data.
Visible Auditing — Make it obvious to the users that you are auditing what they do.

It is important that each of these mechanisms be paired with confirmation and audit to create an incentive structure for the agent.

Data Masking

Rather than showing a user’s birthday, show their age group, or something like “COPA Subject”.

Intentional Disclosure

Confirmation is the idea that an agent should have to take an affirmative step to view sensitive (often masked) data. This can take one of two forms:

Clicking a button/link that reveals the data, and generates an audit trail.
Requiring the user to provide a “reason” for accessing the information if it’s something that should not generally be needed. An example may be “Why do you need access to this user’s records?”

Match Only

For example, instead of showing a user’s email, you could provide a “confirm email” link that asked the agent to enter the email, and would simply confirm whether it was correct or not. Another example, along identity proofing, would be “Who is someone you have recently interacted with?” and providing only confirmation of that, or not. It is helpful to provide fuzzy matching in these place for a better customer experience.

Visible Auditing

While many systems provide an audit trail (“who did what when to whom and why?”), that trail is typically not visible to most users. If there is no audit trail, then there is no ability to investigate problems, or even detect problems in the first place. Making that visible helps agents understand that their actions are being recorded when they interact with customer data. Some best practices:

Any retrieval of customer data requires a ticket reference.
Access to customer data requires a “reason”. If this isn’t encoded in the ticket, it should be provided otherwise. That ticket reference number should be displayed prominently in the user interface, and should be easy to switch to a new one as needed. Links from external systems (especially case-management) should inject the ticket/case number into the system for the user. This can be done through URL parameters easily.
Communicate to the agent in training that all actions are linked to the case.
Auditing should also happen to another structured repository (an audit service for example) with identical data.
Auditing into the ticket should be clear, and human readable. For example: “User susanj viewed jdoe account.” If there are ways to link this, and the structured audit logs, using UUID for example, this can be very helpful.

In many ways, this is similar to behavioral encouragement in physical security. For example, there are multiple studies showing that the majority of the benefit of CCTV cameras is from the deterrent value to would-be thieves, and whether they are monitored or not has minimal real-time impact, but is only useful in reconstructing past events. This is why, in many low-value situations, many of the CCTV domes do not contain actual cameras.

Level 3: Detection#

Once we have encouraged better behavior, reducing the false positives in the system, we are left with two primary goals: investigation of the resulting issues and confirmation that the system is behaving as intended. Without delving into the nuance between the two, there are several practices that can be helpful in supporting both that can be broken into groups.

First, human-based detection systems:

Managers are provided a daily summary of their agents data access patterns, and any outliers.
Random sampling of cases for review by independent reviewers.

Then, we have the automated detection systems, either real-time or batch:

Post-processing mapping of audited access to records to the subject of the case.
Honeypot accounts that even attempting to view triggers audit, review, and (optionally) a request for justification.
Risk scoring. Agents accumulate “risk” based on data access and actions, and that risk is balanced by ratings (based on type) of the cases they worked. This looks at risk in aggregate, and can prompt investigation of the out-of-balance agents. I intend to explore this in a future post.

Level 4: Enforcement#

Enforcement is related to the concept of attributed-based access control (ABAC), which is sometimes called policy-based, or claims-based, access control. Specifically, it is designed to prevent access to data that the agent shouldn’t be using. The reason this is considered the most advanced stage is that it requires mature processes with well-defined data requirements, careful training, and advanced data modeling and understanding that can take years for an organization to develop, if ever. If you can’t detect poor behavior, how can you hope to enforce preventative controls in real-time?

Some scenarios that can be modeled and managed this way:

Agent A is assigned case B for user C. The system enforces that because case B is a password reset case, agent A can only view a subset of data for C so long as B is open. Additionally, their actions are restricted.
Agent A is assigned case B for user C. Because this case is an abuse case, it may limit A from viewing data for users who are not C, or one degree of social connection from C.
A case is being investigated which involves abuse from a specific IP, so the agent is restricted to only users associated with that IP. To expand it, it might require a manager or someone else adding additional IP address scope to the case.
Agents cannot view data of themselves, or other agents in the system without a case approved by a manager.

Some specific situations that have been used elsewhere:

Agent A cannot work on a case of anyone related to them. This requires a thorough relationship graph to be maintained of people.
Agents cannot work on cases related to people in their reporting chain without special approval.

As you can see, this requires a more complex model of both our data, and our customer support cases, and therefore is something to keep in mind for later.

Multi-Party Control#

Orthogonal to the above levels, but potentially associated with level 2/3 is the concept of multi-party controls, or multi-party authorization . The short description is that it takes 2 (or more) people to perform certain high-risk activities. For example, in a typical Internet application’s customer service system, you might allow a single agent to send a reset email to the registered email of the user, but require multi-party control over changing that email to reduce the chance of account takeover.

Basically, it requires one person to set-up the action, and another to approve/execute it. This flow serves the purpose of both quality control and abuse protection. It is a form of affirmative consent in its nature. It requires a few things to be true of the non-originating party:

They are fully informed of what they are agreeing to;
They are freely able to refuse to consent;
They clearly demonstrate their consent to the action.

There are a ton of potential areas of concern in implementation. It may be helpful to start with this paper . While these risks are much larger in systems that cross organizational boundaries, they are potentially present even internally.