Is content identification necessary for prevention of secret leakage?

By 0
Is content identification necessary for prevention of secret leakage?

Identifying an appropriate management method

The past information security concept focused on defense, so the building and security awareness of firewalls and anti-virus software already tended to be perfect. In such an over-all situation, to be honest, the opportunity to protect data security by intensive invasion and direct hacking is very low. Internal threats noted recently were not new concepts; in 2006, the Data Loss Prevention product emerged in the market in 2006. The DLP defined by SANS and Gartner should be able to identify and appropriately process contents of filtered data. This is easy to understand because prevention cannot be achieved until identification is realized. However, is identification appropriate or essential? This problem must be analyzed according to the contents actually operated.

The difficulty in content identification lies in “management”.

Let’s discuss the method for content identification first. Domestic and foreign enterprises successively developed various methods and they respectively had their achievements. When we evaluated such systems, the first factor that we considered is the accuracy and of course the higher it is, the better it will be. It is obvious that this requires a high price. This can be illustrated by taking the SPAM junk mail identification that we are relatively familiar with as an example. No matter how excellent it is, the known best accuracy is around 80%. Why? There is an ambiguity in how to identify a mail as a junk mail. If you are engaged in the life insurance industry, then the illustration or advertisement for insurance promotion is not junk but is not very popular with ordinary people. Therefore, this problem correlates with your position, title, and industry. Even different results may be obtained if the problem is judged by people. However, the accuracy may be improved by applying many skills such as adding the sender, network domain, receiver, etc., but the nature of ambiguity in such problem still cannot be changed.

The DLP provides various forms of judgment methods such as fingerprint identification (of files), Tagging or combination of both. Generally, DLP provides a targeted identification. For example, HIPPA and PCI-DSS are structured data. Naturally, the ambiguity is not high for such scope and the accuracy is much better. If we explain the problem with personal data judgment, then various data are collected and judged based on rules. Assuming that confidential data contents must not be leaked out from the company, then part of such contents must not be cited, duplicated, saved and printed. Therefore, the DLP system also needs to calculate the contents of the fragments and this of course requires an additional computing overhead. According to the description in the performance guide of a large DLP company, the average latency is approximately 2-3s for duplicating a file with a size of less than 1MB (Local -USB, Local-Samba) and the latency for uploading files (1MB) to http/https through chrome and firefox is approximately 7-11s. The average latency for printing files is approximately 7s. The overhead is indeed high. (Client software and hardware indicated by the original manufacturer: Windows 10 x64 /8GB/Core i7 3.6GHz).

Before this, the management may need to face the problem of how to develop rules and policies. Although some manufacturers also provided end user self-help to develop rules, it was inevitable for them to repeatedly examine and adjust such rules. Management units can very hardly master the rules accurately when facing contents of other business units; they knew that even units with same businesses cannot sufficiently master such rules. This was not too difficult. After all got on the right track, what was left is exceptional management. In terms of system or administration, the situation involved personal data from beginning to end. All jobs of some personnel were to process personal data and these are operational costs that must be faced and these are hidden management costs.
Another indicator is easily ignored. We do not use a special noun and describe such indicator as “misplacement”. The definition is that personal data are contained in the document and are not detected by the system and released. However, once such event occurs, nobody will know it. Because the system did not detect such data, of course we will not know it. It is difficult to provide evidence once data security events require sworn evidences. Such situation must be able to be improved. What needs to be investigated is that personal data cannot be detected under what circumstances. Rule prevention should be strengthened based on this. However, as mentioned previously, the unknown situation is faced and there is a certain difficulty. On the other hand, the misjudgment rate refers to that a document does not contain personal data but it is considered as containing personal data. This situation is mostly exceptional handling and does not greatly affect overall operation.

What kind of enterprises want content identification or consider DRM?

Due to regulation compliance, different industries have different regulation requirements. HIPPA and PCI-DSS as mentioned previously require contents to be identified, so content filtering is essential.。If operation of sensitive data during normal times is an internal exchange, then content judgment is not very important. The requirement is that such contents should not be leaked out and DRM may be more appropriate. DRM implements encryption protection and right control of the document or file itself, so even though the document or file is leaked out it cannot be used without rights. It can be said that the leakage channel is not important if we have DRM.

Based on comparison to real life, if our objective is to investigate the person stealing confidential files, this is also reasonable. Even though the file is encrypted and cannot be used, the event in which the file is stolen needs to be evidenced. Therefore, a system with endpoint data monitoring in combination with DRM needs to be considered. Endpoint data monitoring can provide records of various operations and file access and even video and implement data security control to a great extent. If contents do not need to be judged, this will be a good option.

Leave a reply

Your email address will not be published. Required fields are marked *

Your Name:*

Your Website

Your Comment