Professor
of Computer Science and Engineering at Washington University Pedro Domingos
wrote the book “The Master Algorithm: How the Quest for the
Ultimate Learning Machine Will Remake our World”. I would like to
introduce you the main idea of the one of the book’s chapter - "The world
of machine learning".
Professor divides data into four categories: those
that we share with everyone, those that we share only with friends and
colleagues, those that we share with different companies, and those that we do
not distribute at all. The first type includes, for example, reviews on Yelp,
Amazon and TripAdvisor, ratings on eBay, a resume on LinkedIn, blogs, and
tweets. These data are very valuable and cause the least problems. We share
them with the world, because we want it, and it all goes to the benefit. The
only difficulty is that companies that store these data do not always allow
it to be downloaded in bulk for building models. They should change their
approach. Today you can go to TripAdvisor and see the reviews and ratings of
hotels that have interested us, but what about the model of factors that make
the hotel good or bad in general? With its help, it would be possible to
evaluate hotels that have little reliable reviews or even none at all.
TripAdvisor could create something like that. And what about the modeling
factors that determine the attractiveness of the hotel for you? This requires
information about your identity, and you may not want to share it with TripAdvisor.
It's better to have a trusted third party that will connect the two types of
data and give you the result.
Data of the second kind should not create problems either,
but this is not so, because it is in contact with the third kind of data. We
share news and pictures with our friends on Facebook, and they share with us.
In this case, each of us shares all this information with the Facebook network.
The network takes advantage: it has a billion friends. Day after day, it learns
about the world much more than an individual could learn. All this knowledge
Facebook uses mainly for targeted advertising, and in exchange creates an
infrastructure for the exchange of information: this transaction is for every
user. Learning algorithms are becoming more powerful and extract more and more
benefits from the data, which is partly returned in the form of more
appropriate advertising and better services. The only problem is that Facebook
is free to do with data and models that contradict the interests of the user,
and this will not be avoided.
Such a problem appears everywhere where a person shares data
with companies, and these days these situations include almost all activities
on the Internet and many in real life. Everyone wants to get your data. While
each company has only a particle of the whole. Google knows what we are looking
for on the Internet, Amazon has information about our purchases, AT&T -
about phone calls, Apple - about the music we download, Safeway has a complete
idea of what products we eat, and Capital One - about our transactions with
Credit cards. Some companies, for example, Acxiom, correlate information about
us and sell it, but in actual fact (aboutthedata.com), it turns out a bit, and
it is partly wrong. No one has a close picture of our personality. This is good
and bad. It's good, because the one who will manage to get it will have too
much power. Bad - because, as long as it is so, the creation of a comprehensive
model is impossible. In fact, we just need to be the sole owner of such a model
and grant access to it solely on our own terms.
The last type of data - those that we do not share - also
poses a problem, and it consists in the fact that sometimes such information
should be provided. Maybe it did not occur to you, maybe it's not easy or you
do not have that desire. In the latter case, it is worth considering whether we
have an ethical duty to share data about ourselves. Cancer patients can
contribute to the victory over this disease if they provide access to the tumor
genome and treatment history. The data that we generate in our daily lives can
provide answers to all sorts of questions about society and politics. Social
sciences enter their golden age and finally they will receive a volume of data
comparable to the complexity of the phenomena studied, and the benefits for all
of us will be enormous - provided that these data will be accessible to
scientists, decision makers and citizens themselves. This does not mean that we
should let others spy on your personal life. This means that we need to give
them the opportunity to get acquainted with the obtained models, in which there
will be only statistical information. Between us and them should be an honest
data broker, which ensures that information about us will not be misused and
thus there will not be "freeloaders" who seek to obtain benefits
without sharing their data.
Professor Domingos concludes that there are
problems for all four types of data and suggests a solution which you can find
out in the next post.
For more analytics posts follow this link: https://analyticsinbusinessworld.blogspot.com/2017/04/monetization-of-analytics-data.html
For more analytics posts follow this link: https://analyticsinbusinessworld.blogspot.com/2017/04/monetization-of-analytics-data.html
Thank you for letting us know about this book. Now I would like to read it.
ReplyDeleteData has to be transparent before it's transformed for any business.
ReplyDelete