The data analytics triangle: Evaluating costs and benefits
On a personal note, I prefer using triangles as a visual aid to guide decision-making. When I initially started with data analysis and descriptive data analytics, I kept coming back to three key decision points. I decided to write these three conditions down as a triangle because it provided me with an effective visual on weighing the costs and benefits as I progressed through the decision of whether to apply the use of data sets for a particular audit step. If all three of the below criteria are met, the benefits likely outweigh the costs, and I can proceed with leveraging data analytics for the task. However, if one side of the triangle is weak (for example, if the data is highly unstructured and costs additional weeks of time to make the data usable for the engagement, and reporting deadlines are at risk), I might forego the benefit of studying the data set for insights.
Part 1: Understanding the business process, risks, and management's use of analytics
This remains a fundamental skill in planning and performing an audit but is also my first stage gate in the cost/benefit triangle. If I do not understand the business process and overall associated risks, but can gather the data, there is still a significant risk that I will be identifying false positives in the audit. Further, as part of routine test of design walkthroughs, we can inquire of management to understand if they are already performing some degree of data analysis/data analytics in managing the day-to-day operations of the business unit. This is a great way to strengthen your relationship with the business unit by collaborating with management on this topic. During the audit, we can also evaluate the strengths of their data analytics model to make the process even better.
Part 2: Identifying "bright-line" rules in the business process
Reflecting on my undergraduate and graduate accounting classes, several accounting textbooks referenced “bright-line” rules from authoritative guidance that shape financial accounting and tax concepts. I use this phrase here as well when we set out on the decision whether to use data or data analysis techniques. The closer that we can get a binary “yes/no” condition for a particular attribute, the easier it will be to identify anomalies in the population. The risk here is the more ambiguity involved, the more time we may spend on studying false positives. If this is weak, I would consider passing on data analytics for this task, especially if you are just starting out with data analytics in your audit workflow. It might be best to avoid that “climbing a mountain” feeling. Financial-based, transactional type of information is a good starting point for a potential data analytic test because there are typically some bright-line rules that are followed before a transaction is recorded in the general ledger.
Part 3: Assessing data availability and usability
This is key. More likely than not, data is going to be available. As described above, there are two potential challenges. The first roadblock might be accessing the data. The second could be the format and weighing how much time needs to be spent on applying the data into a workable format. The risk here is that if we don’t have good data cleansing techniques we may end up spending too much time on false positives throughout the population set.
Examples of how to apply the data analytics triangle
Using the decision-making framework outlined above, the following are two examples that highlight the factors I considered whether a particular audit engagement was suited for leveraging large data sets for analysis and descriptive analytics.
Example 1: Logical access audit
Part 1: Understanding the business process, risks, and management's use of analytics.
The logical access team was responsible for managing access to IT resources by granting or removing access in accordance with an approved request. The primary risks inherent in the process were unauthorized access to IT resources, access that exceeds or involves mismatched job responsibilities, and access not being revoked in a timely manner.
Given my understanding of the business process, I developed several hypotheses that I wanted to evaluate against an entire population of tens of thousands of tickets. For example, I wanted to use the data set to understand the systems with the greatest number of logical access requests for the audit period; what systems that had “bunches” of logical access requests in short periods of time (likely the result of a new system, or an indication of something else); and the elapsed time between the initial request, approval, and ticket closure. We also inquired of management to understand what metrics and data techniques they might have been using to help manage the day-to-day, which led to productive and collaborative conversations about “a day in the life” from management’s standpoint.
Part 2: Identifying "bright-line" rules in the business process.
This IT team provisioned or deprovisioned access after the request was approved by the environment owner. Further, the access that was granted needed to match the requested access. And, in the case of deprovisioning access, that step was time-bound. Therefore, there were clear bright-line rules embedded in the business process.
Part 3: Assessing data availability and usability.
The logical access team granted the internal audit team read-only access to the ticketing system. The data was structured, did not require much cleansing, and was exportable to Excel, where I could filter, apply formulas, and create charts and graphs. When it came to evaluating other control attributes that required additional data sources, such as access listings, those were in a structured format from a database query or Excel export. Therefore, it made sense to invest time in applying some degree of data analysis and diagnostic analytics for this engagement.
How did I apply data analysis and descriptive analytic techniques?
Understanding that all activity had already taken place, I would classify my approach as a series of data analysis with limited descriptive analytics.
The analysis was developed in Excel, using common techniques such as filters, formulas, pivot tables, and charts. By evaluating the entire population of tickets, anomalies stood out, such as date ranges with spikes in requested access, tickets that exceeded average closure time, the requesting party that filed the greatest number of requests, the employee that approved the greatest number of requests, and the total number of requests that were made for access to systems classified as the most restrictive for the organization.
This analysis was done during the planning stage of the engagement. The results provided key metrics, such as the total number of requests processed by the team, that set the context for the planning memo and for the audit report. Further, it gave our engagement team an opportunity to reflect and organize our level of effort for specific systems to apply more granular audit procedures.