Steps to applying data analytics
Consistently seen across available literature are five common steps to applying data analytics:
- Define your Objective
- Understand Your Data Source
- Prepare Your Data
- Analyze Data
- Report on Results
Each of the steps are critical and each step has challenges. Understand and overcoming the challenges requires a deeper look into each step. This can best be achieved by first posing introspective questions for each topic.
Step 1 - Define your objective
Ask the following questions:
- What are you trying to achieve?
- What could the result look like?
Before you even start looking at the data, figure out what it is you think the data could tell you. Imagine what the results could be. For example, if you are going to look at a listing of transactions from an accounts payables system, you might come up with objectives like: look for transactions that are not supported by purchase orders, or look at the days of the week the transactions posted. The result could show that not all transactions were supported, or we may find transactions posted on the weekend when no one should be working. The ACFE and The IIA both offer courses on data analytics theory that provide examples on the types of analytics and when to use each one. Starting with proper and realistic goals can reduce wasted time as you move through the process.
Step 2 - Understand your data source
Ask the following questions:
- What information do I need?
- Can I get the data myself, or do I need to ask an IT resource?
Depending on data availability, you may need to combine information from multiple sources. Many times the, type of testing you can perform will be dictated by the information that is available to you. For example, let’s say you are planning to look for transactions that were posted on the weekend when no one is supposed to be working. At a minimum, the data you need for this test must include a date/time stamp on each transaction. You will probably want more than just the date/time stamp. You will also want to know who processed the transaction, what was the dollar amount, and was there a supervisor approval.
Your next question will be about actually getting the data. If you have the required access and sufficient training on the systems involved, you may be able to get the information on your own. Realistically, most IT departments do not want us all pulling our own information. It is perfectly acceptable to ask for help. Have the IT department extract the information you need. When involving someone else in this process, make sure you give them clear instructions on the data to pull and set scoping parameters. If you ask for all the data, you may end up with 20 years’ worth of transactions when you only wanted all of the transaction from this year.
Step 3 - Prepare your data
Ask the following questions:
- Does the data need to be cleansed?
- Does the data need to be normalized?
Data preparation includes many different aspects, so we will focus on two of the broadest, most encompassing points: cleansing the data, and normalizing the data. Cleansing data addresses the quality of the information, while normalization eliminates redundancies.
Cleansing the data is especially true when the information is coming from multiple sources. Sometimes you will have a column of text in a spreadsheet, but some of the cells also have numbers, or spaces in front of the letters, or symbols in the data. Cleansing the data will remove all of the unrecognizable information from the cells.
Normalizing the data is very closely related to cleansing. Normalizing looks for different version of the same data entry. For example, you may have the last name O’Brien in your data six different ways: O’Brien, O_Brien, Obrien, O’brien, O_brien, OBrien. These are probably all the same name, just input into systems in different ways. Normalizing converts all of the variations into one format. If you don’t cleanse and normalize the data, the output will either produce an error, or the results will be unusable or unreliable.
Step 4 - Analyze data
Ask the following questions:
- What tests can I run on the data?
- Is help available to understand results?
At this point, you will have come up with the objective, pulled the data, and spent some time cleansing and normalizing the information, and now it’s time to run the test. Your data analytics tools will help you summarize the information. Again, look to the professional organizations like The IIA and the ACFE for training if you are not sure which test to run.
Once you run the tests, you may or may not understand the results. Your best resource to understand the results will likely be the people you are auditing. If it is appropriate, you should take the results back to the data owners for help understanding the output. Keep in mind, this may not always be appropriate, especially if this is for a fraud examination.
Step 5 - Report results
Ask the following questions:
- Will management understand the results?
- Can you represent the results visually?
Avoid presenting management with tables full of numbers. We need to effectively communicate results without lengthy explanations. Use charts and graphs with simple notes.
One of the most important factors in helping management focus and understand the results is the amount of information we present. We must be careful not to overwhelm management with endless information. Always present summary information and provide any details as an appendix to the summary. Remember that these are busy people with a very limited amount of time to dedicate to your data, so be succinct and provide your reports well in advance of any meetings.
In line with our basic audit process, we should provide more risk information to management. You should feel free to have open, risk-based discussions with them as it relates to the results.
Instead of getting mired in the details, where applicable, provide more trending information related to audit results. Showing trends is more illustrative of the organization’s overall status. Examples could be trending by types of findings, results by business unit, or trends in the status of different data classifications. However, if we are dealing with a fraud examination, the exact opposite may be preferable. When we must present the details, be sure to make a clear, unbiased presentation of the test results.
Especially if you are just getting started with a formal data analytics program, following these five simple steps will greatly increase your success in the implementation and use of data analytics to support your audit process.