CIOs and IT directors working on any project that involves data in any way are always more likely to succeed when the organisation has a clear view of the data it holds.
Increasingly, organisations are using data classification to track information based on its sensitivity and confidentiality, as well as its importance to the business.
Data that is critical to operations or that needs to be safeguarded – such as customer records or intellectual property – is more likely to be encrypted, to have access controls applied, and be hosted on the most robust storage systems with the highest levels of redundancy.
AWS, for example, defines data classification as “a way to categorise organisational data based on criticality and sensitivity in order to help you determine appropriate protection and retention controls”.
However, data protection measures can be costly, in cash terms and potentially in making workflows more complex. Not all data is equal, and few firms have bottomless IT budgets when it comes to data protection.
But a clear data classification policy should ensure compliance and optimise costs – and it can also help organisations make more effective use of their data.
What is data classification used for?
Data classification policies are one of the Swiss Army knives of the IT toolbox.
Organisations use their policies as part of their business continuity and disaster recovery planning, including setting backup priorities.
They use them to ensure compliance with regulations such as GDPR, PCI-DSS and HIIPA.
These policies are fundamental to effective data security, setting rules for encryption, data access, and even who can amend or delete information.
Data classification policies are also a key part of controlling IT costs, through storage planning and optimisation. This is increasingly important, as organisations store their data in the public cloud with its consumption-based pricing models.
But it is also essential to match the right storage technologies to the right data, from high-performance flash storage for transactional databases, to tape for long-term archiving. Without this, firms cannot match storage performance, associated compute and networking costs, to data criticality.
In fact, with organisations looking to drive more value from their information, data classification has another role – helping to build data mining and analytics capabilities.
“The topic of data management has crept up in importance among the leadership teams of many organisations over the past few years,” says Alastair McAulay, an IT strategy expert at PA Consulting.
“There are two big drivers for this. The first driver is a positive one, where organisations are keen to maximise the value of their data, to liberate it from individual systems and place it where it can be accessed by analytics tools to create insight, to improve businesses performance.
“The second driver is a negative one, where organisations discover how valuable their data is to other parties.”
Organisations need to protect their data, not just against exfiltration by malicious hackers, but against ransomware attacks, intellectual property theft and even the misuse of data by otherwise-trusted third parties. As McAulay cautions, firms cannot control this unless they have a robust system for labeling and tracking data.
What do data classification policies take into account?
Effective data classification policies start out with the three basic principles of data management:
- Confidentiality.
- Integrity.
- Access.
This “CIA model” or triad is most often associated with data security, but it is also a useful starting point for data classification.
Confidentiality covers security and access controls – ensuring only the right people view data – and measures such as data loss prevention.
Integrity ensures that data can be trusted during its lifecycle. This includes backups, secondary copies and volumes derived from the original data, such as by a business intelligence application.
Availability includes hardware and software measures such as business continuity and backup and recovery, as well as system uptime and even ease of access to the data for authorised users.
CIOs and chief data officers will then want to extend these CIA principles to fit the specific needs of their organisations and the data they hold.
This will include more granular information on who should be able to view or amend data, extending to which applications can access it, for example through application programming interfaces (APIs). But data classification will also set out how long the data should be retained for, where it should be stored, in terms of storage systems, how often it should be backed up, and when it should be archived.
“A good data backup policy may well rely on a data map so that all data used by the organisation is located and identified and therefore included in the relevant backup process,” says Stephen Young, director at data protection supplier AssureStor. “If disaster strikes, not everything can be restored at once.”
What are the key elements of a data classification policy?
One of the more obvious data classification examples is where organisations hold sensitive government information. This data will have protective markings – in the UK, this ranges from “official” to “top secret” – which can be followed by data management and data protection tools.
Firms might want to emulate this by creating their own classifications, for example by separating out financial or health data that has to comply with specific industry regulations.
Or firms might want to create tiers of data based on their confidentiality, around R&D or financial deals, or how important it is to critical systems and business processes. Unless organisations have the classification policy in place, they will not be able to create rules to deal with the data in the most appropriate way.
A good data classification policy “paves the way for improvements to efficiency, quality of service and greater customer retention” if it is used effectively, says Fredrik Forslund, vice-president – international at data protection firm Blancco.
A robust policy also helps organisations to deploy tools that take much of the overhead out of data lifecycle management and compliance. Amazon Macie, for example, uses machine learning and pattern matching to scan data stores for sensitive information. Meanwhile, Microsoft has an increasingly comprehensive set of labelling and classification tools across Azure and Microsoft 365.
However, when it comes to data classification, the tools are only as good as the policies that drive them. With boards’ increasing sensitivity to data and IT-related risks, organisations should look at the risks associated with the data they hold, including the risks posed by data leaks, theft or ransomware.
These risks are not static. They will evolve over time. As a result, data classification policies also need to be flexible. But a properly designed policy will help with compliance, and with costs.
What are the benefits of data classification?
There is no avoiding the fact that creating a data classification policy can be time-consuming, and it requires technical expertise from areas including IT security, storage management and business continuity. It also needs input from the business to classify data, and ensure legal and regulatory compliance.
But, as experts working in the field say, a policy is needed to ensure security and control costs, and to enable more effective use of data in business planning and management.
“Data classification helps organisations reduce risk and enhance the overall compliance and security posture,” says Stefan Voss, a vice-president at IT management tool company N-able. “It also helps with cost containment and profitability due to reduction of storage costs and greater billing transparency.”
Also, data classification is a cornerstone of other policies, such as data lifecycle management. And it helps IT managers create effective recovery time objectives (RTOs) and recovery point objectives (RPOs) for their backup and disaster recovery plans.
Ultimately, organisations can only be effective in managing their data if they know what they have, and where it is. As PA Consulting’s McAulay says: “Tools will only ever be as effective as the data classification that underpins them.”