In the enterprise mentioned in the data protection, we may often think of documents, few people will pay attention to the contents of the document, the data management is also relatively simple, usually full encryption, full authorization, the importance of the document do not distinguish With the development of society, the format of the document is more and more, the continuous outbreak of security incidents, making people’s attention to the data has changed, the data is divided into structured data and unstructured data, more attention to the contents of the document Sensitive information, what is the application of the document, the different types of documents, documents containing different content are different management and storage.
In the past to control the data, mostly strong control, directly all isolated, or all encrypted, we call the cage, yoke-style control, in the actual data production, use, flow brought a lot of unnecessary trouble, people Need a more flexible way to deal with the data, this time, intelligent data security control came into being, business administrators can be in accordance with the importance of data targeted to control the data.
Article Content
- The core capability of data leakage prevention
- Data is leakproof
- Data Anti – Leakage General Technology
- Basic detection technology
- Advanced detection technology
- Accurate data comparison
- Fingerprint document comparison
- Vector machine classification
- Data Anti – Leakage Control and Encryption Technology
- Equipment filtering drive technology
- File – level intelligent dynamic encryption and decryption technology
- Network – level intelligent dynamic encryption and decryption technology
- Disk – level intelligent dynamic encryption and decryption technology
- Data leakage prevention product evolution
- Piston type DLP products
- Yoke type DLP products
- Monitor DLP products
- Smart DLP products
The core capability of data leakage prevention
What is DLP? Literally translated as “Data Leakage (Loss) Prevention data leakage protection”, its core competence is the content identification, through identification can be extended to the data prevention and control. Content recognition should have the ability to identify specific keywords, regular expressions, document fingerprints, exact data sources (database fingerprints), support vector machines, for each of the capabilities will be derived from a variety of composite capabilities.
DLP should also have the ability to protect, including network protection and terminal protection. Network protection mainly in the audit, control, terminal protection in addition to audit and control capabilities, but also should include the traditional host control capabilities, encryption and permissions control capabilities.
In general, DLP is actually a complex, the ultimate effect, should be intelligent discovery, intelligent encryption, intelligent control, intelligent audit, but also a set of data leakage protection program.
Data is leakproof
The following figure illustrates the physical configuration of the DLP and the resident location of the different models within the organization. “Network DLP” products are resident in the DMZ, while other products are resident in the corporate LAN or data center. In addition to the “Terminal DLP” product, all other products are server-based.
Data Anti – Leakage General Technology
In order to prevent data loss, all types of confidential data must be accurately detected regardless of where the data is stored, copied, or transmitted. If there is no accurate detection, the data security system will generate many false positives (which will not violate the news or file identified as illegal) and omission (no violation of the policy message or file identified as a violation). False positives will cost a lot of time and resources for further investigation and resolution of apparent accidents. Omission will cover up security vulnerabilities, resulting in data loss, potential financial losses, legal risks and damage to organizational reputation. So the need for accurate detection technology to do protection. To ensure the highest accuracy, DLP uses three basic inspection techniques and three advanced detection techniques.
Basic detection technology
Basic detection technology usually has three ways, regular expression detection (identifier), keyword and keyword pair detection, document attribute detection . Basic detection methods using conventional detection technology for content search and matching, more common are regular expressions and keywords, these two methods can be a clear sensitive information content detection; document attribute detection is mainly for the type of document, The size of the document, the name of the document to detect, in which the type of document detection is based on the file format for testing, not simply based on the suffix detection, for the modification of the suffix of the scene, file type detection can accurately detect the file Type, currently supports more than 100 kinds of standard file types, and can customize the characteristics, to identify the special file type format documents.
Advanced detection technology
Advanced detection technology also has three ways, accurate data comparison (EDM), fingerprint document comparison (IDM), vector classification (SVM) . EDM is used to protect data that is typically structured, such as customer or employee database records. IDM and SVM are used to protect unstructured data, such as Microsoft Word or PowerPoint documents. For EDM, IDM, SVM, sensitive data will be identified by the enterprise first, and then by DLP to identify its characteristics, to carry out accurate continuous detection. The process of distinguishing features includes DLP access and retrieval of text and data, normalization, and protection against irreversible disruption.
DLP detection is based on actual confidential content, not on the file itself. Therefore, DLP can not only detect sensitive items of search terms or derivatives, but also to identify the file format and feature information format different sensitive data. For example, if you have identified the characteristics of a confidential Microsoft Word document, DLP will be able to detect it exactly when the same content is submitted by e-mail as a PDF attachment.
Accurate data comparison
Precise Data Matching (EDM) protects customer and employee data, as well as other structured data that is typically stored in a database. For example, the customer may write a strategy for using EDM detection to find the situation in which any of the “name”, “ID number”, “bank account number”, or “telephone number” To the records in the customer database.
EDM allows detection according to any combination of data columns in a particular data column; that is, N fields in M fields are detected in a particular record. It can be in the “value group” or the specified set of data types on the trigger; for example, can accept the name and ID number of the combination of the two fields, but does not accept the name and phone number of the combination of the two fields.
Since a separate scrambling number is stored for each data cell, only mapping data from a single column can trigger a detection strategy that is looking for a different data combination. For example, there is an EDM policy request “name + ID number + phone number” combination, the “three” + “13333333333” “110001198107011533” can trigger this strategy, but even if the “four” is also located in the same database, Li four “+” 13333333333 “” 110001198107011533 “can not trigger this strategy. EDM also supports similar logic to reduce possible false positives. For the free format text processed during the detection period, the number of words in the data in a single feature column must be within the configurable range and can be considered a match. For example, by default, in the text of the detected e-mail text, “Zhang three” + “13333333333” “110001198107011533” the number of words must be within the selected range, there will be a match. For text containing table data (such as Excel spreadsheets), all data in a single feature column must be on the same line as the form text, and can be considered a match to reduce overall false positives.
Fingerprint document comparison
Fingerprint Document Matching (IDM) ensures accurate detection of unstructured data stored as documents, such as Microsoft Word and PowerPoint files, PDF documents, financial, M & A documents, and other sensitive or proprietary information. IDM creates a document fingerprint feature to detect the retrieved portion of the original document, a draft, or a different version of the protected document.
IDM first to the sensitive file learning and training, to get sensitive content of the document, IDM using semantic analysis of the word technology, and then semantic analysis, proposed to need to learn and training sensitive information document fingerprint model, and then use the same The fingerprint of the measured document or content is captured, the obtained fingerprint is compared with the trained fingerprint, and the detected document is checked according to the preset similarity. This method allows IDM with high accuracy and greater scalability.
Vector machine classification
Support Vector Machines was proposed by Vapnik et al. In 1995. Then with the development of statistical theory, support vector machine has gradually been the concern of researchers in various fields, in a very short period of time to get a very wide range of applications. The support vector machine is based on the VC dimension theory and structural risk minimization principle of statistical learning theory. The information provided by the limited sample is used to find the best compromise between the complexity and learning ability of the model, To get the best generalization ability. The basic idea of SVM is to map the training data non-linearly to a higher dimension feature space (Hilbert space). In this high-dimensional feature space, a hyperplane is found so that the edge between the positive and the anti- Is maximized. The emergence of SVM effectively solves the problems of traditional neural network result selection, local minimum, over-fitting and so on. And it is widely used in pattern recognition and data mining in the fields of small sample, nonlinearity, data high dimension and other machine learning problems.
SVM alignment algorithms are suitable for those with subtle features or difficult to describe the data, such as financial reports and source code. In the course of the process, the document is subdivided according to the content classification, each type of document set has the meaning of this class, after SVM match, to determine the type of document to be detected, and access to such documents permissions and strategies The At the same time, for the characteristics of SVM, the terminal or the server can be classified according to the classification of the meaning of data classification.
The difference between IDM and SVM is that IDM fingerprints the file in the file to be tested and each file in the training model. SVM is the vector of the file to be detected and is assigned to a class of training set Vector space.
Data Anti – Leakage Control and Encryption Technology
Equipment filtering drive technology
A device filter drive programming technology, can achieve any terminal equipment (USB port, printer, optical drive, floppy drive, infrared, Bluetooth and network cards, etc.) security protection and control. Automatic identification of hardware information, user identification, storage devices and non-storage devices, authorized equipment and unauthorized equipment and other information.
File – level intelligent dynamic encryption and decryption technology
A file-level filter driver programming technology, real-time blocking file system read / write requests, the file dynamic tracking and transparent encryption / decryption processing. The main advantages: the file encryption / decryption dynamic, transparent, do not change the user’s operating habits; performance impact is small, the system is running high efficiency; does not change the original file format and status, at the same time, deployment and internal use is very convenient.
Significant features are: encryption mandatory, the use of transparency, confidentiality, application-independent, flexible and expandability. Its development has gone through three phases: single cache filter driver technology , dual cache filter driver technology and virtual file system technology (LayerFSD) . Most of the kernel-class encryption vendors in the commercial market today use single-cache filter driver technology, a small number of vendors have developed to double-cache filter driver technology, and the development of virtual file system technology (LayerFSD) and achieve the product manufacturers are numbered
Network – level intelligent dynamic encryption and decryption technology
A network filter driver programming technology, commonly known as NDIS and TDI technology, can achieve the network transmission protocol and network application protocol data filtering and control. At present, such technology is mainly used in firewall, VPN, network access and other related areas.
Disk – level intelligent dynamic encryption and decryption technology
A disk-level filter driver programming technology, also known as full encryption and decryption technology (FDE, FullDiskEncryption), the core technology work in the bottom of the operating system, including the operating system files, including hard disk all data encryption protection.
Using the encryption method based on the physical sector level can be stored in the hard disk all the data encryption, and file encryption, disk encryption can encrypt any data on the hard disk, of course, can also encrypt the operating system, non-authorized users not only see Less than the contents of the file on the hard disk, but also do not see the name of any file saved on disk! File-level encryption methods are generally able to get encrypted file name, use time and other information, and even from the temporary file, disk exchange file to obtain a certain content information, and disk encryption so that all data on the hard disk are in encrypted state, People who get encrypted hard drives can not get any information. Because in the encrypted partition, there is no concept of the file! Not to mention the file name and content and other information.
In order to facilitate the user to operate and do not change the user’s computer habits, the use of dynamic encryption and decryption methods, between the operating system and the disk installed a data encryption and decryption procedures, the program does not require user intervention, automatically stored to The disk data for encryption operations, read data from the disk to do decryption operation, the user in the normal use of the computer, simply do not feel the existence of this program.
Data leakage prevention product evolution
Piston type DLP products
Main features of this stage is the device a strong control, isolation logic means employed to construct container security isolation
Since 2000, foreign security management products have poured into China, the beginning of the concept of guidance, and slowly transformed into products, well-known product manufacturers, including Symantec, LANDesk, 2005 to 2008, their market share in China has been To 80%. After 2008, with the development of domestic products began to enter the market, so far foreign terminal management products have been a large number of domestic products to replace, although the market has shown a saturated state, but there are nearly 40 million yuan each year from the share Managed terminal management products.
Yoke type DLP products
This stage of products mainly for document strong management and control, offers a source-level defense in depth capability; disaggregated data documentation, classification, encryption, authorization and management
Different from the terminal management, data encryption and rights control products have changed the focus from the device into a specific data file, the control method is more granular, confidential way is better, from 2007 to date, the market emerged a lot of strength Of the excellent manufacturers, because the country’s regulatory requirements, encryption products can only get the relevant confidentiality qualification, password authentication can be used in the country, so that foreign products can not be large in the country’s sales, encryption and rights products so far each year also has 1 billion yuan market share, all industries have data protection needs, although the market is highly competitive, but users still worry that the data will be encrypted kidnapping, and is the global scope. But fortunately all the products are very mature, very stable.
Monitor DLP products
Monitor the type of product is the behavior of strong audit, an audit of the use of accurate keyword data manipulation actions, documents create, modify, transmit, store, monitor the conduct of the deleted
Behavior audit, divided into network behavior audit and terminal behavior audit, network behavior audit can effectively monitor the staff working time network access behavior, and terminal behavior audit can be more targeted to complete the operation of the key data file behavior. Audit products and other network and end products coexist, can complement each other, so far the market share is still high, but with the development of many networks and end products continue to improve and enhance the individual behavioral audit products have been unable to survive, diversification began Favored by customers.
Smart DLP products
Smart products to the pursuit of intelligent management and control, can be identified, can be found, manageability, provide common management and control capabilities
In order to more comprehensive control of the data, the terminal management products and encryption rights products do a lot of combinations of programs, but are all strong global control, there are certain limitations, can not be applied to more complex data environment, in this In the case of a variety of data leaks around the world, people pay attention to the data on the content, then, content-aware DLP products came into being, through the content to identify the important data Sex, through the content to classify the data, through the content to the data level division, intelligent control mode also brings convenience and flexibility.
Since 2013, the domestic and domestic manufacturers to vigorously promote the production and application of DLP in the financial industry and the operator industry is set off a trend, but the domestic product is still in a budding stage, the product is immature and unstable DLP domestic The road has brought resistance, many terminals, encryption and auditing firms began to transition, but the real DLP products do not exceed three.