Yes, @healthyswimmer is explaining it right. In itself data mining is neither good nor bad. In fact, it is often used for good purposes to get insights from huge data sets that are otherwise impossible for humans to work out. It could be used for e.g. demographics, studying improvements to a city’s infrastructure, study weather patterns, or how microbacterial life affects the human body.
We are not campaigning about these uses, and maybe we should refine the title of this topic to be more accurate.
Where we are worried it concerns the data mining of personally identifiable data, or PII after it has been ‘data harvested’. Undermining is a good word, because it often happens without your awareness and you have no control over it.
Once your PII data leaves your device, you completely lose control over it:
- You lose control over who owns it, how it spreads, and how it is used
- You lose control over how long it is stored, how it is combined and enriched
- You lose control of the ability to correct or remove it, if it contains errors or is inaccurate
- You are mostly even completely in the dark and unaware of what data on you exists at all
So anyone may work with for their own gain, whether with good intentions or not. They create software, products and services, that process it. Systems that are buggy at the start, inaccurate. Or even - more nefariously - systems that deliberately manipulate your PII in improper ways, or leave things out of the picture, so that it is perceived as more valuable to subsequent customers that consume it.
For example there is a growing awareness with marketers and advertisers that the Ads they buy are not all that effectively targeted to the appropriate prospects that they want. That advertising platforms embellish the metrics, and are focused on generating clicks, but actual conversion rate is low.
That is just one example, but shows that the advertising company may not only not have the wellbeing and satisfaction of their users (i.e. the products) as their priority, but also don’t really care about their customers, the ad buyers, as long as they keep buying ads.
Another hypothetical example (that may already exist). If I create an online AI service that sells deep insights on people applying for a job to employers who are the customers of my service, then my product requirement is to do thorough analysis and inference and subsequently compile that into crystal-clear short reports to the employer. Ideally drawing a conclusion at the end about the applicant, such as e.g. ‘Suited’ / ‘Not Suited’.
People tend to place a huge amount of trust in such systems, especially if they are presented on highly professional smart-looking websites, with references to scientific papers, etc. Your potential employer may now completely forego their traditional selection process. They get 400 applicants, press a button, and after AI analysis only 20 of them remain. Automated rejection mails are sent to the others. Never in this process went a human eye to your CV and application letter, or did you get an inquiring call.
You may be rejected, while you were the best candidate in reality! But the analysis took e.g. conclusions about your social profile over all the years when data was collected. Data which includes your wild years as a student, when you were partying and posting pictures of that, not expressing yourself as eloquently all of the time. Why would you at that great time in your life? But after graduation, you may have completely changed your professionality and stance.
But the data is still weighing heavy in the jugdment that the AI makes about you. And you are unaware of that.