Why do we need independent data operators that will store the digital personality? Part 1.

Professor of Computer Science and Engineering at Washington University Pedro Domingos wrote the book “The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake our World”. I would like to introduce you the main idea of the one of the book’s chapter - "The world of machine learning".

Professor divides data into four categories: those that we share with everyone, those that we share only with friends and colleagues, those that we share with different companies, and those that we do not distribute at all. The first type includes, for example, reviews on Yelp, Amazon and TripAdvisor, ratings on eBay, a resume on LinkedIn, blogs, and tweets. These data are very valuable and cause the least problems. We share them with the world, because we want it, and it all goes to the benefit. The only difficulty is that companies that store these data do not always allow it to be downloaded in bulk for building models. They should change their approach. Today you can go to TripAdvisor and see the reviews and ratings of hotels that have interested us, but what about the model of factors that make the hotel good or bad in general? With its help, it would be possible to evaluate hotels that have little reliable reviews or even none at all. TripAdvisor could create something like that. And what about the modeling factors that determine the attractiveness of the hotel for you? This requires information about your identity, and you may not want to share it with TripAdvisor. It's better to have a trusted third party that will connect the two types of data and give you the result.

Data of the second kind should not create problems either, but this is not so, because it is in contact with the third kind of data. We share news and pictures with our friends on Facebook, and they share with us. In this case, each of us shares all this information with the Facebook network. The network takes advantage: it has a billion friends. Day after day, it learns about the world much more than an individual could learn. All this knowledge Facebook uses mainly for targeted advertising, and in exchange creates an infrastructure for the exchange of information: this transaction is for every user. Learning algorithms are becoming more powerful and extract more and more benefits from the data, which is partly returned in the form of more appropriate advertising and better services. The only problem is that Facebook is free to do with data and models that contradict the interests of the user, and this will not be avoided.

Such a problem appears everywhere where a person shares data with companies, and these days these situations include almost all activities on the Internet and many in real life. Everyone wants to get your data. While each company has only a particle of the whole. Google knows what we are looking for on the Internet, Amazon has information about our purchases, AT&T - about phone calls, Apple - about the music we download, Safeway has a complete idea of what products we eat, and Capital One - about our transactions with Credit cards. Some companies, for example, Acxiom, correlate information about us and sell it, but in actual fact (aboutthedata.com), it turns out a bit, and it is partly wrong. No one has a close picture of our personality. This is good and bad. It's good, because the one who will manage to get it will have too much power. Bad - because, as long as it is so, the creation of a comprehensive model is impossible. In fact, we just need to be the sole owner of such a model and grant access to it solely on our own terms.

The last type of data - those that we do not share - also poses a problem, and it consists in the fact that sometimes such information should be provided. Maybe it did not occur to you, maybe it's not easy or you do not have that desire. In the latter case, it is worth considering whether we have an ethical duty to share data about ourselves. Cancer patients can contribute to the victory over this disease if they provide access to the tumor genome and treatment history. The data that we generate in our daily lives can provide answers to all sorts of questions about society and politics. Social sciences enter their golden age and finally they will receive a volume of data comparable to the complexity of the phenomena studied, and the benefits for all of us will be enormous - provided that these data will be accessible to scientists, decision makers and citizens themselves. This does not mean that we should let others spy on your personal life. This means that we need to give them the opportunity to get acquainted with the obtained models, in which there will be only statistical information. Between us and them should be an honest data broker, which ensures that information about us will not be misused and thus there will not be "freeloaders" who seek to obtain benefits without sharing their data.

Professor Domingos concludes that there are problems for all four types of data and suggests a solution which you can find out in the next post.

For more analytics posts follow this link: https://analyticsinbusinessworld.blogspot.com/2017/04/monetization-of-analytics-data.html

Olga's blog - Analytics in business

Tuesday, April 25, 2017

Why do we need independent data operators that will store the digital personality? Part 1.

2 comments: