Undermined: Explaining Why Data Collection and Mining can be Harmful

In our mission we are saying that technology should serve us, yet we are subservient to it. I just bumped into this picture in a tweet:


Since the data extraction and analysis are done involuntarily or without our knowledge, you could also say that we are slaves or victims.


I just had an interesting conversation where someone suggested that data mining was a natural progression of technology that is simply to absorbed and ignored. What do companies do with our data that makes this harmful exactly?

1 Like

@sidnya this is a very good question- one thing I explain to people is for instance- if one researches something personally private they wouldn’t even share with their pet;). This data could be stored and sold to people- your data becomes a commodity at that point. What the would people do with this information? Well… what if we researched something about drug rehab or mental illness etc… what if someone sold this info to the government someday? This might affect you getting an important job- or if you wanted to become a politician. Right now data leaks out on social media and literally ruins people’s careers. The fact is- nobody is perfect and we all have a right to make mistakes or inquire about “life” without this information being sold.

So to answer your question- I tell people the government job story- or being fighter pilot- you need a squeaky clean record to be these things- I haven’t heard of people missing opportunities like these based on privacy breaches- but we don’t actually know the extent to how our privacy is invaded.


Yes, @healthyswimmer is explaining it right. In itself data mining is neither good nor bad. In fact, it is often used for good purposes to get insights from huge data sets that are otherwise impossible for humans to work out. It could be used for e.g. demographics, studying improvements to a city’s infrastructure, study weather patterns, or how microbacterial life affects the human body.

We are not campaigning about these uses, and maybe we should refine the title of this topic to be more accurate.

Where we are worried it concerns the data mining of personally identifiable data, or PII after it has been ‘data harvested’. Undermining is a good word, because it often happens without your awareness and you have no control over it.

Once your PII data leaves your device, you completely lose control over it:

  • You lose control over who owns it, how it spreads, and how it is used
  • You lose control over how long it is stored, how it is combined and enriched
  • You lose control of the ability to correct or remove it, if it contains errors or is inaccurate
  • You are mostly even completely in the dark and unaware of what data on you exists at all

So anyone may work with for their own gain, whether with good intentions or not. They create software, products and services, that process it. Systems that are buggy at the start, inaccurate. Or even - more nefariously - systems that deliberately manipulate your PII in improper ways, or leave things out of the picture, so that it is perceived as more valuable to subsequent customers that consume it.

For example there is a growing awareness with marketers and advertisers that the Ads they buy are not all that effectively targeted to the appropriate prospects that they want. That advertising platforms embellish the metrics, and are focused on generating clicks, but actual conversion rate is low.

That is just one example, but shows that the advertising company may not only not have the wellbeing and satisfaction of their users (i.e. the products) as their priority, but also don’t really care about their customers, the ad buyers, as long as they keep buying ads.

Another hypothetical example (that may already exist). If I create an online AI service that sells deep insights on people applying for a job to employers who are the customers of my service, then my product requirement is to do thorough analysis and inference and subsequently compile that into crystal-clear short reports to the employer. Ideally drawing a conclusion at the end about the applicant, such as e.g. ‘Suited’ / ‘Not Suited’.

People tend to place a huge amount of trust in such systems, especially if they are presented on highly professional smart-looking websites, with references to scientific papers, etc. Your potential employer may now completely forego their traditional selection process. They get 400 applicants, press a button, and after AI analysis only 20 of them remain. Automated rejection mails are sent to the others. Never in this process went a human eye to your CV and application letter, or did you get an inquiring call.

You may be rejected, while you were the best candidate in reality! But the analysis took e.g. conclusions about your social profile over all the years when data was collected. Data which includes your wild years as a student, when you were partying and posting pictures of that, not expressing yourself as eloquently all of the time. Why would you at that great time in your life? But after graduation, you may have completely changed your professionality and stance.

But the data is still weighing heavy in the jugdment that the AI makes about you. And you are unaware of that.

1 Like

So it’s sort of like if you met a friend and you gave them a file with everything about you in it. Then that friend turned and gave it to a stranger. The stranger can do whatever they want with that information now that they have it.

1 Like

Yes, exactly, but more accurate is that your friend rewrites some of the file, adding stuff he thinks to know from experience and opinion about you, and leaving some other things out. Then presents this to the stranger who adds some info from another friend and leaves something out again, before passing it on another time.


This article in Politico shows another example of exactly what we are talking about in this topic:

A quote:

Over the past year, powerful companies such as LexisNexis have begun hoovering up the data from insurance claims, digital health records, housing records, and even information about a patient’s friends, family and roommates, without telling the patient they are accessing the information, and creating risk scores for health care providers and insurers. […]

No law prohibits collecting such data or using it in the exam room. Congress hasn’t taken up the issue of intrusive big data collection in health care. It’s an area where technology is moving too fast for government and society to keep up.

The Opioid crisis is used as rationale for allowing very loose regulation and dubious practices (similar to how war on terror is used to invade privacy).


@aschrijver this is absolutely a violation of the HIPPA privacy act in the US. Per our government the pharmacies have to provide information to a national database of every opioid prescription people get- this way every pharmacy knows if a particular patient has filled opioid prescriptions. This is absolutey necessary- to protect everyone- patients, doctors and pharmacies. I’ve seen the visable result of regulation and it works.

What doesn’t work is the insurance companies sharing information with anyone outside the immediate healthcare setting taking care of a specific patient. At my hospital, if I look up information on my neighbor or anyone I’m not directly caring for, my hospital could get fined 5 digit numbers. This means the opioid score should be completely private and secure.

If you know of anyone who is sharing this information is should be reported immediately.

Here is a a summary of HIPPA- this describes how health information is kept secure in the US- something our government did right. It is very hard to get patient information today actually.


We’ve changed everything we do based on this law- and insurance companies have to do the same. If the writer of this article actually knows information is being sold- they need to report it- otherwise I doubt they really understand the law.


@sidnya I think you’re bringing up a really good point here.

So far, most communications on this subject are, in a word, nerdy. Tech people, writers, various levels of sociopolitical geeks. We care about this stuff. But how good are we at communicating the serious harms?

I work in marketing, in the belly of the beast, to support my art and journalism habits. When I read conversations here on Humane Tech, or at lower-tech sites that are campaigning to get the surveillance tech craziness under control, I imagine how my colleagues at, say, Plazm Design (plazm.com) would handle the job.

We’d do it strategically. We’d start with values and mission, then target audience, then content strategy. After these essential building blocks were in place, we’d move into making the actual content. That is certainly how we’d do it with our Fortune 500 clients! Or even a little local startup business.

The grassroots activism model has limitations. We have well-intentioned folks posting random stuff, essentially. Sometimes we presumptuously announce that we’re starting a “campaign” based on this idea or that idea. Someone makes a random graphic and posts it. Someone else gets an article published someplace. There’s not much consensus, not much strategy. It’s scattershot.

Personally I find all the infographics I’ve seen quite inadequate. They’re not getting to the real point, which is, how do we convert People Who Think They Don’t Care About Privacy? Especially the ones who are very busy, distracted, and don’t do a lot of deep thinking, questioning, or long-form reading?


Nice points, @magdalen! We would love it if you help bring more structure here, even if we work with limited tools still. And your professional expertise too is most welcome :smiley:

We are only at the start of our Awareness Program and there is a lot of stuff to do with few people active still. But that last thing is about to change this year, I expect, as we start to cooperate with CHT on a number of things.


I have an idea to help increase structure. What if we created seperate groups for the campaigns themselves (like there is a campaginer group).

Projects could have to submit a proposal to the community and after it gets a set number likes or comments they could be granted a group. The person who proposed the project could become moderator of the granted group. Alternatively the proposal could be submitted to the moderators for approval of some kind. We can collectively create apporoval criteria through a wiki post.

This would allow some more structured collaboration and easily distinguish projects from normal forum posts. Groups also have their own forums, so it could prevent noise from entering the normal forum. It would also work with the existing structure of the HTC website I think.

I think some more structure would be cool and help with organization and effectiveness. I do think that the grass roots movement can overcome limitations with collective effort.


Yes, this is the idea, but collaboration occurs on Github in our current setup. We have to think of improvements to the workflow, but are really busy now.

The problem with forum groups is that their interaction appears out of sight, and groups need to be maintained by forum moderatos. The structure that we currently have is not so bad, but need clearer documentation. We’ll get to this in due time, I promise. The awareness program is really important to us.

1 Like

I have another data mining related question. Today I got a youtube ad created specifically for people in my town and another one for a website for homework similar to one I already use. How does this happen? Where do they get this information?

Oh, that is really basic tracking, and could be determined in numerous different ways, e.g. from your IP-address, or GPS location, connecting to free wifi in a bar, etc. etc. Your homework site may have trackers, or you clicked a hyperlink to another site that did, etc.

A fun exercise: download the dataset that Google has about you. Might be gigabytes of raw data about you present there.

1 Like

@aschrijver Im going to try that over the weekend!

Maybe we could create a Wiki with links to the pages for each Github project (not sure if that already exists), sort of like a directory on this website. I know this already existis on the Github page itself, but havin a similar situation on the forum may be helpful.

Yes, we’ll have to find the easiest way to get an overview of the community, I agree. One other thing where a lot of this will be placed is the aforementioned FAQ page, and the Community Website, but more is needed… but reorganizing the forum itself has to be done first of all.

1 Like

How does that occur? Through the CHT? (I’m still a bit new haha. Don’t quite know)

No, by us members doing the job ourselves. But it’ll take some time. Our community team is really busy now, but we hope to delegate some of our load to others soon :slight_smile:

1 Like

That is really cool! Anything I can do to help out?


What is very valuable for as - since you just joined the community with fresh perspective - is feedback on perception + organization. You can provide that in the FAQ Discussion topic, so we can take it into consideration, while we are working on the new forum structure.