Big Brother/Data 2016

The power of big data, AI/analytics, and subtle data collection are converging to a future only hinted at in Orwell’s 1984.  With the rapid developments on many fronts, it is not surprising that those of us who are only moderately paranoid have not been tracking it all. So here’s an update on some of the recent information on who is watching you and why:

Facebook (no surprise here) has been running personality quizzes that evaluate how your OCEAN score lines up.  That is Openness, Conscientiousness, Extroversion, Agreeableness and Neuroticism.  These “Free” evaluations are provided by Cambridge Analytica. The applications of this data to political election influence is documented by the NY Times (subscription required) and quoted in part by others.  The short take is that your Facebook profile (name, etc.) is combined with your personality data, and “onboarding” data from other sources such as age, income, debt, purchases, health concerns, car, gun  and home ownership and more.  Cambridge Analytica is reported to have records with 3 to 5 thousand data points on each of 230 million adult Americans. — which is most of us.

How to they use this data?  Psycho-graphic micro-targeted advertising is the recent target, seeking to influence voting in the U.S. Election.  They only support Republican candidates, so other parties will have to develop their own doomsday books.  There is no requirement that the use of the quizzes be disclosed, nor that the “ads” be identified as political or approved by any candidate.  The ads might not appear to have any specific political agenda, they might just point out news (or fake news) stories that play to your specific personality and have been test-marketed to validate the influence they will have on the targeted voter(s).  This may inspire you to get out and vote, or to stay-home and not bother — depending on what candidate(s) you support (based on social media streams, or more generalize characteristics if you personally have not declared your preferences.)  — Impact — quite possibly the U.S. Presidency.

But wait, that’s not all.

The U.K is expanding their surveillance powers, requiring Internet Companies to retain interactions/transactions for a year, including every web site you have accessed. This apparently is partially in response to the assertions by France that similar powers had foiled an ISIS attack in France. The range of use (abuse) that might be applied by the UK government and their allies remains to be seen (or more likely will remain hidden.)

But, consider what China is doing to encourage residents to be “sincere”. [Here is a serious limitation of my linguistic and cultural skills — no doubt there is a Mandarin word that is being used and translated to “sincere”, and that it carries cultural implications that may not be evident in translation.]  Data collected to determine your “social credibility rating”. includes: tax, loan, bill, and other payments (on time?), adherence to traffic rules, family planning limits, academic record, purchasing, online interactions, nature of information you post online, volunteer activity, and even “filial piety” (respect for elders/ancestors). And the applications of such data?  So far 4.9 million airline tickets have been refused. Your promotion, or even job opportunities can be limited with “sensitive” jobs being subject to review — judges, teachers, accountants, etc. A high score will open doors — possible faster access to government services.  By letting citizens see their score, they can be encouraged to ‘behave themselves better’.  By not disclosing all of the data collected, nor all of the implications the state can bully citizens into far greater sincerity than they might adopt if they were just trying to not break the law.

Your comments, thoughts and responses are encouraged, but remember — they are being recorded by others for reasons you may never know.  … Sincerely yours, Jim

It’s 10PM do you know what your model is doing?

“Customers like you have also …”  This concept appears explicitly, or implicitly at many points in the web-of-our-lives, aka the Internet. Specific corporations, and aggregate operations are building increasingly sophisticated models of individuals.  Not just “like you”, but “you”! Prof. Pedro Domingos at UW  in his book “The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World” suggests this model of you may become a key factor of your ‘public‘ interactions.

Examples include having Linked-in add a “find me a job” button that will conduct interviews with relevant open positions and provide you a list of the best.  Or perhaps locating a house, a car, a spouse, …well, maybe somethings are better done face-2-face.

Apparently a Asian firm, “Deep Knowledge” has appointed a virtual director to their Board. In this case it is a construct designed to detect trends that the human directors might miss.  However, one suspects that Apple might want a model of Steve Jobs around for occasional consultation, if not back in control again.

If the Computer Said it, it must be True!

Well, maybe not.  “What Happens When GPS Can’t Find You?” is a commercial concern raised by a Wall St. Journal article.  Needless to say a business in today’s world is at risk if the GPS location associated with it is wrong, or just the path that is required to get there is not correct.  Consumers at best are frustrated, and may simply write off that operation.  In this case it is often not the business’s fault, but one in the GPS location service, or route mapping.

Behind this is a more pervasive and serious problem.  Often there is no way to “fix” these problems from the perspective of the consumer or the an affected business.  You may know the data is wrong, the route doesn’t work, and correcting the error(s) is not a straight forward path, and certainly not easy enough that the “crowd-source” solution would work. That is, many people might find the error, and if there were a simple way to “report” the problem, after the “nth” report, an automated fix (or review) could be triggered.

This is not just  GPS problem. I’ve found many web sites are validating addresses against equally flawed sources (perhaps even the USPS).  I can send mail to my daughter (and she gets it), I’ve even seen the mailbox on the side of her street. By one of the web sites I used to deliver items to her location is rejecting the address as “not known”… and of course there is no way to report the error. A related problem is entering an address in “just the right way” — am I in “Unit A101” or “Apt. A 101″ or maybe Apt A101”, note that the delivery folks can handle all of these, but the online ordering system can’t.  Technology design consideration: track such ‘failures’, and after some number, check the validation process, or better have a button such as “I know this is right, so please update the database”.

Online operations are losing business, as well as brick-and-mortar activities due to online “presumptions” of correctness .. and no corrective processes available.  It’s one thing when the word processor marks your spelling as “wrong”, but lets you keep it anyway.  It is another when medications or essential services can’t reach your location because the GPS or delivery address is not in the database, or is listed incorrectly.

Predictive Analytics – Rhinos, Elephants, Donkeys and Minority Report

The  IEEE Computer Society published “Saving Rhinos with Predictive Analytics” in both IEEE Intelligent Systems, and in the more widely distributed ‘Computing Edge‘ (a compendium of interesting papers taken from 13 of the CS publications and provided to members and technologists at no cost.  The article describes how data based analysis of both rhino and poacher activity in concert with AI algorithms can focus enforcement activities in terms of timing and location and hopefully save rhinos.

For those outside of the U.S., the largest population of elephants (Republicans) and donkeys (Democrats) are in the U.S.– these animals being symbols for the respective political parties, and now on the brink of the 2016 presidential primaries, these critters are being aggressively hunted — ok, actually sought after for their votes.  Not surprisingly the same tools are used to locate, identify and predict the behaviour of these persons.   When I was young (1964) I read a book called The 480, which described the capabilities of that timeframe for computer based political analysis and targeting of “groups” required to win an election. (480 was the number of groupings of the 68 million voters in 1960 to identify which groups you needed to attract to win the election.)   21st century analytics are a bit more sophisticated — with as many as 235 million groups, or one per potential voter (and over 130 million voters likely to vote.).  A recent kerfuffle between the Sanders and Clinton campaign over “ownership/access” to voter records stored on a computer system operated by the Democratic National Committee reflects the importance of this data.  By cross connecting (data mining) registered voter information with external sources such as web searches, credit card purchases, etc. the candidates can mine this data for cash (donations) and later votes.  A few percentage point change in delivering voters to the polls (both figuratively, and by providing rides where needed) in key states can impact the outcome. So knowing each individual is a significant benefit.

Predictive Analytics is saving rhinos, and affecting the leadership of super powers. But wait, there’s more.  Remember the movie “Minority Report” (2002). This movie started on the surface with apparent computer technology able to predict future crimes by specific individuals — who were arrested to prevent the crimes.  (Spoiler alert) the movie actually proposes a group of psychics were the real source of insight.  This was consistent with the original story (Philip K Dick) in 1956, prior to The 480, and the emergence of the computer as a key predictive device.  Here’s the catch, we don’t need the psychics, just the data and the computers.  Just as the probability of a specific individual voting for a specific candidate or a specific rhino getting poached in a specific territory can be assigned a specific probability, we are reaching the point where aspects of the ‘Minority Report’ predictions can be realized.

Oddly, in the U.S., governmental collection and use of this level of Big Data is difficult due to privacy illusions, and probably bureaucratic stove pipes and fiefdoms.   These problems do not exist in the private sector.  Widespread data collection on everybody at every opportunity is the norm, and the only limitation on sharing is determining the price.  The result is that your bank or insurance company is more likely to be able to predict your likely hood of being a criminal, terrorist, or even a victim of a crime than the government.  Big Data super-powers like Google, Amazon, Facebook and Acxiom have even more at their virtual fingertips.

Let’s assume that sufficient data can be obtained, and robust AI techniques applied to be able to identify a specific individual with a high probability of a problematic event — initiating or victim of a crime in the next week.  And this data is implicit or even explicit in the hands of some corporate entity.  Now what?  What actions should said corporation take? What probability is needed to trigger such actions? What liability exists for failure to take such actions (or should exist)?

These are issues that the elephants, and donkeys will need to consider over the next few years — we can’t expect the rhinos to do the work for us.  We technologists may also have a significant part to play.

IoT and Healthcare

The July/August Issue of IEEE Internet Computing is focused on applications in Heath care for the Internet of Things (IoT).  This morning, when I hit the home page, it had a birthday cake — and on “hover” – it wished me a “Happy Birthday Jim” — just in case you were wondering if your Google entry page might be customized for you — the answer is “yes”.   How do these two statements intersect? In some (near term?) future, that page may have suggested I needed to visit a doctor – either because I was searching a combination of symptoms, or because the sensors surrounding me (my watch, cell phone, etc.) indicated problematic changes in my health (or some combination of data from such diverse sources.)

Of course this might be followed by a message that my health insurance was being canceled, or my life insurance.

As this Internet Computing issue points out, there are many benefits to be gained from having a network of sensors that can continuously monitor and provide feedback on health data. The first paper addresses barriers — legal, policy, interoperability, user perspectives, and technological.  The second paper focuses on “encouraging physical activity” and the third paper considers “quality of life (QoL)” (physical health, psychological, social relationships and environment (financial, safety, freedom, …)) It is evident that IoT and health care have many points of overlap – some intended (monitoring devices) and some unintended (search analysis) — and all with significant personal and social impact considerations.

Besides my ingrained paranoia (will Google automatically apply for my retirement beneifts and direct the checks to their accounts?) and delusional optimism (“Your financial QoL is below acceptable norms, we have transferred $1 million into your accounts to normalize this situation – have a good day”) there are pros and cons that will emerge.

What issues and opportunities do you see?

Police Cameras

My daughter is attending a citizen police academy. They discussed the challenges that police cameras (body, squad car, interview rooms, traffic monitoring, etc.) present — and these related, in part, to the objectives of having such cameras.

1) When an officer is apprehending a suspect, a video of the sequence covers a topic that is very likely to be raised in court (in the  U.S. where fairly specific procedures need to be followed during an arrest.)  Evidence related to this has to follow very specific rules to be admissible.  An example of this concept is in the Fort Collins Colorado police FAQ where they provide some specifics. This process requires managed documentation trails by qualified experts to assure the evidence can be used.  There are real expenses here beyond just having a camera and streaming/or transferring the sequences to the web. Web storage has been created that is designed to facilitate this management challenge. Note that even if the prosecution does not wish to use this material, the defense may do so, and if it is not managed correctly, seek that charges be dismissed. (For culture’s where defendants are not innocent until proven guilty and/or there is not a body of case or statutory defendants rights this may sound odd, but in the U.S. it is possible for a blatantly guilty perpetrator to have charges against him dropped due to a failure to respect his rights.)

2) There are situations where a police officer is suspected of criminal actions. For real time situations (like those in the news recently), the same defendants rights need to be respected for the officer(s) involved. Again close management is needed.

Note that in these cases, there are clear criminal activities that the police suspect at the time when the video is captured, and managing the ‘trail of evidence’ is a well defined activity with a cost and benefit that is not present without the cameras.

The vast majority of recorded data does not require the chain-of-evidence treatment. If a proper request for specific data not associated with an arrest results in data that is used in court, it is most likely to be by a defendant, and the prosecutor is unlikely to challenge the validity of the data since it deprecates their own system.

Of course there are other potential uses of the data.  It might contain information relevant to a divorce actions (the couple in the car stopped for the ticket – one’s spouse wants to know why the other person was in the car); or the images of bystanders at a site might impact the apparent privacy of such persons. (Although in general no right of privacy is recognized in the U.S. for persons in public.)

The Seattle police are putting some video on YouTube, after applying automated redaction software to protect the privacy of individuals captured in the frame. Just the presence of the video cameras can reduce both use of force and citizen complaints.

There are clearly situations where either the police, or the citizens involved, or both would find a video recording to be of value, even if it did not meet evidentiary rules.  Of course the concern related to such rules is the potential for in-appropriate editing of the video to transform it from an “objective” witness to bias it in one direction or another.

We have the technology— should we use it?  An opinion piece by Jay Stanley in SSIT’s Technology and Society journal outlines some of these issues in more detail.

Who is Driving My Car (revisited)

Apparently my auto insurance company was not reading my recent blog entry.  They introduced a device, “In-Drive” that will monitor my driving habits and provide a discount (or increase) in my insurance rates.

There are a few small problems. The device connects into the diagnostic port of the car, allowing it to take control of the car (brakes, acceleration, etc.) or a hacker to do this (see prior Blog entry). It is connected to the mothership (ET phones home), and that channel can be used both ways, so the hacker that takes over my car can be anywhere in the world.  I can think of three scenarios where this is actually feasible.

  1. Someone wants to kill the driver (very focused, difficult to detect).
  2. Blackmail – where bad guys decide to crash a couple of cars, or threaten to, and demand payment to avoid mayhem (what would the insurance company CEO say to such a demand?)  (Don’t they have insurance for this?)
  3. Terrorism – while many cyber attacks do not yield the requisite “blood on the front page” impact that terrorists seek, this path can do that — imagine ten thousand cars all accelerating and losing brakes at the same time … it will probably get the desired coverage.

As previously mentioned, proper software engineering (now a licensable profession in the U.S.) could minimize this security risk.

Then there is privacy.  The  insurance company’s privacy policy does not allow them to collect the data that their web page claims this device will collect — so clearly privacy is an after thought in this case.  The data collected is unclear – they have a statement about the type of data collected, and a few FAQ’s later, have a contradictory indication that the location data is only accurate within a forty square mile area, except maybe when it is more accurate.  What is stored, for what period of time, accessible to what interested parties (say a divorce lawyer) or with what protections is unclear.  A different insurance company, Anthem, encountered a major attack that compromises identity information (at least) for a large number of persons.  I’m just a bit skeptical that my auto insurance company has done their analysis of that situation and upgraded their systems to avoid similar breaches and loss of data. For those wondering what types of privacy policies might make sense, I encourage you to view the OECD policy principles and examples.  Organizations that actually are concerned with privacy  would be covering all of these bases at least in their privacy statements. (Of course they can do this and still have highly objectionable policies, or change their policies without notice.)

Your DNA into Your Picture

A recent Wall St Journal interview with J. Craig Venter indicates his company is currently working on translating DNA data into a ‘photo of you’, or the sound of your voice. The logic of course is that genetics (including epigenetic elements) include the parts list, assembly instructions and many of the finishing details for building an individual.  So it may not come as a surprise that a DNA sample can identify you as an individual (even distinct from your identical twin — considering mutations and epigenetic variations) — or perhaps even to create a clone.  But having a sample of your DNA translated into a picture of your face (presumably at different ages) or an imitation of your voice is not something that had been in my  genomic awareness.

The DNA sample from the crime scene may do more than identify the Perp, it may be the basis for generating a ‘police sketch’ of her face.

The movie Gattaca projected a society where genetic evaluation was a make/break factor in selecting a mate, getting a job, and other social decisions.  But it did not venture into the possibility of not just the evaluation of genetic desirability of a mate, but perhaps projecting their picture some years into the future.  “Will you still need me .. when I’m sixty four?

The interview also considers some of the ethical issues surrounding insurance, medical treatment and extended life spans … What other non-obvious applications can you see from analyzing the genomes and data of a few million persons?

Smart Government: ICT Enabled Social Engagement in Public Organizations

An SSIT Guest Blog provided by: Carlos E. Jiménez; Open & Smart Gov Specialist, IEEE SSIT Sr. Member; Barcelona, Spain.

In a broad sense, we usually use e-Government concept as the ICT adoption by public organizations as helpful tool in order to improve the way they achieve their goals. Key elements in these organizations are elements like efficiency, effectiveness, transparency and citizen-centric oriented.

However, it is important to say that in a more specific sense, there are important differences when we talk about its degrees and elements within this field. Then, we could talk on 4 distinct concepts: e-Administration, e-Government (in a more specific sense), Open Government and Smart Government. These stages are incremental where ICT transform the public organizations at the same time as they produce better services to citizens.

In the table, we can see that e-Administration started with the ICT adoption addressed to automatize workflows in public organizations (1st stage, -Bureaucratic organization) and, later, the e-Government stage (2nd stage, -Professional organization) includes interaction between citizens through the use of electronic tools, as well as bi-directional flows of information allowing citizens to use e-services. Next, technologies contribute and facilitate the move to a 3rd stage (Relational organization) where -Open Government- is achieved, allowing a high degree of the governance paradigm and not only through the use of e-services. In this stage there is a participation of the society in decisions and processes that before, were mainly done exclusively by the organization. A 4th stage and type of public organization (Intelligent organization) after the Relational one, would be based in the optimized IT adoption degree, and how it can transform the public organization as well as society.

Organization  Modernization Level ICT Role
1. Bureaucratic Begin Automatized Workflows  (e-Administration)
Benefit: increased internal efficency
2. Professional Middle Citizenship Interaction (e-Government).
Benefit: efficient public services (filing forms…)
3. Relational Advanced Citizenship participating in governance (Open Government).Benefit: Paradigm of governance
4. Intelligent Optimal:
Adopted completely Interoperability principle and Open Innovation as tool
Interconnected Ecosystem (Smart Government)Benefits: real time, data driven – integration of information, Public-Private-People Partnership…

This 4th “refined” public organization level, would be achieved as a result of ICT as tool that is being used in perfect harmony with: a) Open Government b) the Social & Open Innovation in public organizations and c) a maximized Interoperability Principle [this concept is expanded in a special issue of IEEE Computer Magazine, Oct 2014]. The concept of Smart Government, then, will have all these factors, and the social implications of technology are being key here.

Indeed, we have to understand that territories and cities only will be smarter if and only if are more social, through thinking in the best options for their citizens, specially, avoiding negative impacts of technology. To get a sense for how this looks in practice see, in the case of Barcelona,

What areas of government in your territory are starting to move towards the “Smart Government” level?


Google Drive and the Titanic — UnSyncable

I have a number of files I want to share across my three primary computers, and have backed up in the cloud — “Just in case”. So when Google lowered the price for 100GB of cloud storage, I took them up on the offer … BUT …

Apparently they made a change in the last few days (Circa Feb 1, 2015) and now refuses to sync MP3 files.  Since the Drive APP does not correctly display large numbers of unsyncable files, I had to catch it in the act (with just 700+ of my 1900+ MP3 files.  The message is”Download error: You do not have the  permission to sync this file“. This apparently was applied to ALL MP3 files since it includes recordings of my wife, niece, and cousins as well as CDs and Vinyl “rips” I have done to allow me to listen to that music on my computer(s) — and for which I still have the original media (and I do not sell or share). So it appears that Google (perhaps under pressure from the music industry) has decided to ban MP3 files from Drive. (If you are a musical artist, you obviously need another supplier.) — [A later observation, after more experience and some useful feedback — while it is not clear what triggers Drive to make decisions about Permission to Sync, it is not the .MP3 characteristic alone — following guidance from  Google support, I completely reinstalled it on my Windows8 system and now things sync alright … hmm]

There is a valid copyright concern from IP owners related to sharing of their content.  Google has some experience with this with Google Books. They have argued “fair use” for wholesale capture, storage and indexing of libraries full of books.   Which was upheld in a 2013 court ruling. It is also worth noting that besides copyright for books and MP3 files, every item on Google Drive has an implicit or explicit copyright.  This Blog entry will have an implicit copyright as soon as I post it, actually I think it gains that status as soon as I type it in.  Every email, document, home movie or picture you take, etc. has applicable copyright law — and I can’t envision Google being able to sort out who has what permissions. And with a transition from “first sale” protections to licensing for works, things get more difficult.  If I buy a book, I can re-sell it (or a DVD, CD, etc) … but if I buy a license for something (software, ebook, etc.) …. my rights are limited by the license, not copyright law.  (Which is why Amazon could ‘take back’ copies of Orwell’s “1984” from Kindel devices.)

While seeking to understand the problems I encountered with Drive I  discovered an interesting variation on the problems.  A user reported a system infected with ransomware that encrypted his files and demanded payment to restore access.  The encrypted files  replaced the unencrypted files on Google Drive, which means his “backup” was no longer available (and apparently Google cannot restore prior versions of files.)

Cloud computing in it’s variations opens a batch of new Social Implications … Copyright, protection of content, loss of content, etc. What other challenges do you see for the Cloud?