Why Ethical Use of Data Is So Important
The collection and use of people’s data is a hot topic. Industries from journalism to web development to marketing require data to function, and many businesses use it to reach their goals and out-perform their competitors. But how do you collect the information you need without violating people’s privacy? How do you know you’re working with an ethical proxy provider? What’s the right way to use that data once you have it?
These are the main points companies have to wrestle with when determining the ethical use of data. And while some malicious users may try to give web scraping a bad name — due in part to collectors who do it unethically — it doesn’t have to be a shady practice.
Let’s talk about some of the ethical considerations surrounding data collection and data use. We’ll provide some guidelines on how to do it the right way and recommend the best proxies and web scraping tools you can use to get started.
The Ethics of Web Scraping
Web scraping is actually more a part of our everyday lives online than we think. The sites we use every day, like Google or YouTube, scrape data to determine what to recommend. If we post a video to YouTube, data gets scraped from the video to create the thumbnail for it.
Where web scraping becomes a problem is if information is harvested without consent, people’s IP addresses are used for proxies without their knowledge, or people’s data is used unethically.
If you plan to use web scraping in your business to gain data on your industry or your competitors, you might be tempted to go with the cheapest and most convenient options for software. You may not even think to question where the proxies you use come from, or whether they were obtained ethically.
But where you get your proxies from, and how you scrape data, matters in a big way. If you’re pulling free proxy IP addresses from the internet, they may have been stolen and sold without that person’s knowledge or consent. The person whose cell phone you’ve plugged into your browser extension as a residential proxy might have no idea their device is being used that way. Believe it or not, there are companies that allow these practices, and Blazing SEO will do whatever we can to shine some light on this industry.
Another thorny ethical area when it comes to web scraping is page traffic. If you’re hitting one site with a deluge of requests from multiple proxies, the site administrator might think they’re being attacked. Distributed denial of service (DDoS) attacks work in a similar way to flood site servers and freeze them.
In order to conduct yourself ethically online, you need to consider these factors when scraping data. Luckily, there are methods you can use to respect both the web sites and the people you are using to scrape data.
Best Practices for Ethical Data Collection
After years in the data science industry, Data Science and Data Engineering Consultant James Densmore has compiled a list of tenets to abide by when scraping data for his work.
In a blog post for Toward Data Science, Densmore outlines best practices for both web scrapers and site administrators to use, including:
- Using a public API and avoiding web scraping entirely if there’s one available.
- Requesting data from web pages at a reasonable rate so they don’t get overloaded (as with a DDoS attack).
- Never passing off the data collected as your own.
- Scraping with the goal of creating new value from the data instead of duplicating it.
Web scraping has become very easy with today’s technology. But just because we can do something doesn’t necessarily mean we should.
With data breaches and cyberattacks on the rise, people are more aware than ever of how their data is being used. And they want to make sure that companies are behaving ethically. Being responsible with your data collection and use shows those people that your business can be trusted with their information.
“…high volume web scraping for questionable commercial use that gets the most attention and poses the highest risk for those of us who rely on the vast data of the web to innovate, learn, and create new value,” says Densmore. “With a little respect, we can keep a good thing going.”
It can be difficult to find someone who actually offers ethically sourced proxies. Some businesses claim to conduct themselves well, but don’t actually follow through. So, what should you look for to be sure you’re dealing with someone who actually cares about ethics?
For starters, you should look at how they obtain their proxies. Residential IP addresses especially can be harvested and used without people’s knowledge as proxies, so it’s important to know where you’re really getting yours from.
Why Ethically Sourced Proxies Are Important
Proxies are an integral part of web scraping. Put simply, they act as an intermediary between your computer or device and the internet.
Instead of your device sending out its own IP address, it sends a request to the proxy, which sends a request to the internet. Then, the proxy forwards the page back to you. This masks your identity and location.
That anonymity can be used to send multiple requests to a web page to scrape data. Since residential proxies use the IP addresses of individual devices like computers or phones, they’re less likely to be banned by websites and therefore are in higher demand.
How do you obtain IP addresses ethically? There are three main ways to do it:
- Through a software development kit (SDK): App developers use SDKs to offer an ad-free app experience to websites if they let them use their device IP as a residential proxy when it’s idle.
- By paying people directly: CashRaven pays people directly to use their device IPs as proxies. Eventually, people who opt for this method will be able to control which domains their IP is used as a proxy for, giving them full control over how their IP gets used.
- By selling proxies from trusted third-party providers: Occasionally, IP addresses can be sold to be used as proxies obtained through vetted partners.
Blazing SEO uses these practices for both our data center and residential proxies. Any company that sells them should engage in a thorough vetting process. You’ll notice on our residential proxy page that there is no option to sell us proxies directly. We need to know they were obtained the right way before we buy them.
Ethical vetting processes around proxies should include product demonstrations. Buyers should be required to sign a legally binding contract holding them liable for any use of the proxies they purchase that’s outside of ethical standards.
Another best practice is to vet buyers through a rigorous “know your customer” process so you know who they are and what they do. When looking for someone to buy proxies from, you should be looking for a company that holds people to these standards.
How to Ensure Ethical Data Practices
There are a few simple rules you can follow to make sure you’re behaving ethically when collecting data online. In addition to the ones Densmore listed, you can also:
- Identify yourself to the site administrator. They’ll notice a lot of requests coming to their page and wonder why it’s happening. Identifying yourself and letting them know what’s going on will help things go much more smoothly than keeping them in the dark.
- Ask for permission to get the data you want. Don’t assume it’s free for you to take, and only use the data that you need.
- Give back and provide value when you can. If you’re using the data in a report, cite where you got it. An article? Same thing. Link back to the site if you can. They’ll appreciate the credit and the extra traffic.
Also, read through the website’s terms and conditions and robots.txt file. They tell you what is and isn’t allowed on the site and can save you trouble down the line.
Following these guidelines sets you apart from the businesses that don’t care. It lets people know they can trust you, and you can run your company with a lighter conscience.
The Ethical Issues Of Big Data
As information has become an invaluable commodity, it’s raised ethical issues in the big data industry. While the data they gather is useful to marketers and allegedly done to improve the user experience, companies like Google and Facebook have faced pushback over how much they collect on individual users.
Google is trying out different tracking methods, like its FLoC model, to see if they can replace cookies and still satisfy advertisers. Facebook is repeatedly facing questions over the ethical considerations of data collection and processing, particularly after acquiring WhatsApp and changing the terms of service.
When using data scraping today, especially on such a massive scale, it’s incumbent on companies to conduct themselves ethically and respect users’ privacy.
Collecting and using people’s data comes with heavy responsibility. Not only should big data companies agree to conduct themselves according to strong ethical guidelines, there should be checks in place to make sure they’re walking the walk. The same goes for small businesses and any other entity that uses data collection as a regular part of their work.
When looking for data online, use an API if the site provides one. If you have to scrape data instead, be upfront about it with the site admin and use ethically sourced proxies. Take the time to read and respect the website’s terms of service — they’re there for a reason.
Only collect and keep the data you need, and give credit back to the original owner. Never pass off scraped data as your own. If possible, link back to the website you got the data from and credit the owner to create some value for them in exchange for the information.
Ethical use of data and ethical data collection are possible. We believe we’re setting the standard for ethical proxy use, and invite you to ask us any questions you might have about it. We’d be happy to answer them.
The information contained within this article, including information posted by official staff, guest-submitted material, message board postings, or other third-party material is presented solely for the purposes of education and furtherance of the knowledge of the reader. All trademarks used in this publication are hereby acknowledged as the property of their respective owners.
Start a risk-free, money-back guarantee trial today and see the Blazing SEO
difference for yourself!