What is big data?
Today, big data platform, big data analysis, big data applications have been developed in full swing. Our life is almost surrounded by various pictures, sounds, videos, text and other data. Have you ever had such a confusion: what is big data? How did big data change our lives? How do I use big data in network security? and much more.
Let us slowly open the mystery of big data, look at the human mind and behavior of big data, with what characteristics and business, social value.
What is big data?
The first contact with big data, the most doubts is: how big, is the big data? 1PB? 1EB Or how much?
We certainly will not get the answer from here. Because the big data is comprehensive, complete, and systematically. He let us from the sampling, the local data to rescue, through a large number of data on the overall statistics, find the bottom of the data implied meaning, for our services. It is in life safety, network security, network marketing and other aspects of cut a striking figure, and gradually play an increasingly important role.
What are the characteristics of big data?
- Not random samples, but all data
In the past, because the tools for recording, storing and analyzing data were not good enough, we could only collect small amounts of data for analysis. This makes us very upset. In order to make the analysis simple, we will reduce the amount of data to a minimum. Sampling became the best choice.
However, today’s cloud computing, distributed storage technology has long been popular in the world, data storage and analysis become more simple. We can no longer rigidly adhere to a small number of samples, but to get as much as possible for all data analysis, faster and easier to find the problem. For example: Big data with Steve Jobs cancer treatment “Apple’s legendary president, Steve Jobs, has used a different approach in the fight against cancer to become the world’s sort of all its DNA and tumor DNA, and he is not a sample of only a series of markers Is a data document that includes the entire gene password.
For an ordinary cancer patient, the doctor can only expect her DNA arrangement to be sufficiently similar to the sample used in the experiment. But Steve Jobs’s doctors can be based on the specific composition of Steve Jobs, according to the desired effect of a medication. If cancer caused the drug failure, the doctor can replace another drug in time, that is, Steve Jobs said: ‘from a floating lotus leaf jump to another piece. ‘Steve Jobs joked:’ I am either the first person to overcome cancer in this way, or is the last one because of this way die of cancer. ‘Although Steve Jobs has left the world, this access to all data rather than the sampling method or his life extension for several years. “
- Is not the accuracy, but the hybrid
“Only 5% of the data is structured and can be applied to traditional databases. If you do not accept the mix, leaving 95% of the unstructured data can not be used, only to accept the inaccuracy, we can open a never involved The window of the world. So what is the mix?
Mixed, simply said that with the increase in data, the error rate will be a corresponding increase. big data usually speak with probability, rather than the “sure” face. If we can tolerate some degree of tolerance, the data will bring us more value. For example Hadoop with VISA 13 minutes
“Hadoop is the infrastructure of the open source distributed system that corresponds to the MapReduce system, which is very good at dealing with large amounts of data, which typically transforms big data into small modules and then distributes them to other machines for analysis of big data. It presumes that the hardware may be paralyzed, so a copy of the data is built internally, and the typical data analysis needs to go through the “extraction, transfer and download” process, but Hadoop does not stick to that way, on the contrary, it assumes a huge amount of data The data can not be moved completely, so the data must be analyzed locally. Hadoop output results are not as accurate as the relational database output, and it can not be used for satellite launches, issuing bank account details that require a high degree of accuracy. But for applications that do not require extreme precision, it runs much faster than other systems. For example the customer decentralization, and then respectively, different marketing activities. The credit card company VISA uses Hadoop, which can reduce the time required to process a total of 73 billion transactions in two years from one month to 13 minutes.
So ‘s amazing! So, when a small amount of error is allowed, Hadoop is very useful for big data processing. We can also according to business and other characteristics, to choose different tools, different methods, dealing with big data. “
- is not a causal relationship, but a relationship
“In the big data age, we do not have to know the reason behind the phenomenon, but let the data itself sound.” When the amount of data is small, we often reason the causal relationship between the data. However, with the arrival of big data age, always looking for causality becomes extravagant, time is limited, we need to focus on the results. If the data from the statistics, we get the relevant results. Why it becomes no longer important. For example Wal-Mart, please put the egg tarts and hurricane supplies together. In 2004, Wal-Mart observed the history of this huge database of transactions, the database records not only include each customer’s shopping list, the amount of consumption, including the shopping basket Of the items, the specific purchase time, and even the day to buy the weather. Wal-Mart notes that not only the sales of flashlights have increased, but the sales of egg tarts have increased significantly since the onset of seasonal hurricanes.
Therefore, when the seasonal hurricane approaching, Wal-Mart will put the stock of egg tarts near the hurricane supplies the location, to facilitate customers to buy, increase sales.