From autonomous vehicles to deepfake videos, there are many artificial intelligence and deep learning applications that are making people anxious. Do we want cars making split-second decisions between two tragic outcomes? Will we reach a point where it’s impossible to know whether people depicted in an online video are real?
Consider the idea of completely automated tax returns, with a computer predicting citizens’ tax liability and tax rates. Millions of people already put their trust in systems written by tax experts that function as a machine representation of the tax code. That is essentially how software like TurboTax works.
But what about entrusting tax returns to a computer-crafted inference (scoring) model from a deep neural network trained on millions of actual, past returns from other taxpayers?
It makes me a bit nervous, letting a computer predict my tax rate and liability with only minimal input from me. Yet while there are good reasons to be anxious, there is also great potential in leveraging AI and deep learning to help with the tax process.
Tax fraud has existed in many forms for year, but it has become particularly widespread with the huge increase in identity theft. According to the Identity Theft Resource Center’s 2018 End-of-Year Data Breach Report, while the number of U.S. data breaches decreased 2% to 1,244, the reported number of consumer records exposed containing sensitive personally identifiable information jumped 126% to nearly 450 million.
To defeat fraud controls at the state and federal levels, identity thieves need more data points. So, while the overall breaches have decreased, the drastic increase in exposed consumer records is alarming. Essentially, the fraudsters are collecting more data to improve the efficiency of their attacks.
You’ve probably heard the old saying, “garbage in, garbage out.” That also applies to deep learning fraud models trained on historical tax return data. That training data contains everything from completely manufactured tax returns (used to commit refund fraud, following identity theft) to real citizens hiding their income and inflating their deductions. So the first priority should be ensuring tax bills are not based someone else’s trickery.
Second, we want our tax return to be just that — ours. It must be personalized for our individual situation. Not an assessment that is correct on average, across a population of individuals that are sort of like me. Neural networks used in deep learning are statistically impressive but individually unreliable.
Finally, taxpayers (and tax agencies) want transparency. We did not develop the logic of the tax algorithm — the neural network did. We do not own the logic, and therefore we cannot explain the decision of the algorithm to set a person’s tax rate at 20% or 25%. When talking about deep learning, this is often referred to as the “black box” problem.
Compared to other types of data-driven analysis, the amount and quality of the data is more important with deep learning models. If a pattern is not in the data, it cannot be learned. Conversely, if bad patterns like fraud are in the data, they will be incorporated in the prediction. Additionally, bias and misrepresentations can be transferred to the final model.
For example, if gender or neighborhood is used as a predictor in the model, it is possible to derive higher tax rates for men or individuals living in particular neighborhoods. Even if statistically accurate on average, the model would incorporate factors that could be considered discriminatory.
Where could AI and deep learning help in tax?
While this picture of deep learning for tax preparation sounds bleak and ominous, there are places in tax administration where deep learning is appropriate and beneficial.
In a future world of computer-generated tax returns, taxation of personal and business income would be closer in concept to what is currently possible for property tax assessments, where AI and property appraisers work in tandem to derive a final value. Wake County, N.C., uses this approach. Not only does the system identify property value outliers that need to be investigated, it gives agencies an objective measure to back up their own appraisals, which is useful when citizens appeal.
The strength of deep learning is not to capture logic, but to identify patterns. Instead of trying to train an algorithm to predict tax liability or tax rate, let’s train it to detect fraudulent returns, predict taxpayers who are likely to call in for assistance or dynamically personalize the taxpayer self-service portal based on which type of taxpayer is logged in.
Let’s train algorithms to spot fraud by training them on tax returns previously classified by a human expert as legitimate or fraudulent. Teach them what fraud “looks like” on average. Once trained, the algorithm can determine the probability that any tax return (or bank account, tax preparer or group of tax returns) is fraudulent.
To address transparency concerns, we need “white box” models that inform the auditor or fraud investigator why the algorithm flagged the item as possible fraud. These are excellent and typical applications of machine learning and are used today by many advanced tax authorities around the globe.