Proxy Scraper: How to build one with Stabler.tech

Introduction

In the world of web scraping, proxies are essential tools. They act as intermediaries, masking your IP address and allowing you to access websites without being detected or blocked.

However, finding a consistent source of reliable and up-to-date proxies can be a real challenge. Many free proxy lists are often outdated or compromised, rendering them useless for serious scraping projects. This is where Stabler.tech comes in.

Stabler.tech is a powerful, user-friendly web scraping platform designed to simplify your data extraction tasks. Its intuitive tools and robust features make it an ideal choice for scraping public proxy lists, ensuring you have a constant supply of fresh and functional proxies for all your web scraping needs.

Step 1: Finding Reliable Sources for Public Proxy Lists

The success of your scraping project hinges on the quality of your proxies. While free public proxy lists can be tempting, they often come with risks - outdated information, compromised proxies, or low performance. Finding trustworthy sources is key to ensuring you have a reliable pool of IP addresses to work with.

Here’s a detailed guide to finding those golden nuggets of proxy lists:

Specialized Proxy Websites:

These websites focus primarily on proxy-related services, making them more likely to maintain up-to-date and high-quality lists. Here are a few examples:

HideMyName: https://hidemyname.io/en/proxy-list/- Offers a free proxy list updated daily, with filtering options for anonymity level, country, and protocol.
ProxyScrape: https://proxyscrape.com/ - Provides a wide range of free and premium proxy lists categorized by type, country, and anonymity. They also offer an API for programmatic access to their lists.
Free-Proxy-List.net: https://free-proxy-list.net/ - A simple, straightforward website with a frequently updated list of free proxies.
Spys.one: http://spys.one/en/ - This website provides a list of free proxies with detailed information like latency, uptime, and last checked time.
SSLProxies.org: https://sslproxies.org/ - Focuses on providing free HTTPS proxies, ideal for secure web scraping tasks.

Proxy Forums and Communities:

Active online communities dedicated to web scraping and proxies are goldmines for finding fresh proxy lists. These forums often have dedicated sections for sharing and discussing free proxies. Some popular options include:

Black Hat World: https://www.blackhatworld.com/ - A large forum with a section dedicated to proxies. Be cautious, as not all shared lists are reliable.
Web Scraping Forum: https://www.webscrapingforum.com/ - A forum specifically for web scraping enthusiasts, with discussions and shared resources related to proxies.

GitHub Repositories:

Several GitHub repositories maintain curated lists of public proxies. These lists are often updated regularly through community contributions. Search for repositories with keywords like "free-proxy-list," "public-proxies," or "proxy-list."

Evaluating List Sources:

When evaluating any source, consider these factors:

Update Frequency: Look for lists that are updated daily or at least weekly. Older proxies are more likely to be non-functional or blocked.
List Size and Variety: A larger list with diverse proxy types and countries provides more options.
Anonymity Level: Opt for lists that specify the anonymity levels of the proxies (transparent, anonymous, elite).
User Feedback: Pay attention to user reviews, comments, and discussions about the list source.

Remember: Public proxies are inherently volatile. Even the best sources can't guarantee 100% reliability. Always test and validate proxies before using them in your scraping projects, and be prepared to refresh your list regularly.

Warning: Make sure the proxies your are using support the HTTPS, which means the traffic remains encrypted when you use them. Check also that the certificate of the website is valid to avoid man-in-the-middle attacks

Step 2: Setting Up Your Stabler.tech Account and Integrating Proxies

Before you can unleash the power of Stabler.tech for scraping those valuable proxy lists, you'll need to set up an account and configure it to work seamlessly with proxies. Here's a comprehensive guide to get you started:

Create Your Stabler.tech Account:

Visit the Website: Navigate to https://stabler.tech and click on the "Sign Up" button, located in the top right corner of the page.
Choose a Plan: Stabler.tech offers a variety of plans tailored to different needs and budgets. They typically provide options like:
- Free Trial: This allows you to explore the platform's features for 7 days.
- SOLO Plans: These plans are designed for solo users with varying scraping requirements, often differing in the number of scrapers you can create, the number of requests you can make, and the level of support provided.
- PLUS Plans: Ideal for collaboration, these plans provide features for managing multiple users and sharing configurations within a team.
Complete the Registration Form: Provide the required information, including your email, a secure password, and billing details. You might also be asked for optional information like your company name or website.
Verify Your Email Address: You'll receive a confirmation email with a verification link. Click on the link to activate your Stabler.tech account.

Integrating Proxies:

Stabler.tech offers two primary methods for integrating proxies:

Partner Integration:
- Connect to a Partner: Stabler.tech partners with reputable proxy providers to offer seamless integration.
- Navigate to "Proxy Farm": Once logged in, find the section in your dashboard related to proxy management, usually called "Proxy Farm."
- Choose a Partner: Select your preferred proxy provider from the list of available partners.
- Follow the Setup Process: Stabler.tech provides a guided process for connecting your account with the proxy provider. This typically involves authorizing Stabler.tech to access your account with the proxy provider, selecting the desired proxy types, and configuring location settings (if required).

Custom Proxy Setup:
- Acquire Proxies: Purchase proxies from a trusted proxy provider or use a list of free public proxies you've gathered.
- Add Your Proxies: In the "Proxy Farm" section, you'll find an option to add custom proxies.
- Provide Proxy Details: Enter the necessary information for each proxy, including:
  - Server Host: The IP address or hostname of the proxy server (e.g., "192.168.1.100" or "proxy.example.com").
  - Server Port: The port number used to connect to the proxy server (e.g., "8080").
  - Authentication: If the proxy requires authentication, select the "Needs Authentication" option and provide the username and password.
- Test Your Proxies: Stabler.tech usually offers a feature to test the functionality of your added proxies. This ensures they're working correctly before you use them in your scraper.

Managing Your Proxies:

Set Default Proxy: Choose a default proxy from your list. This will be automatically selected when creating new configurations.
Organize with Tags: Use tags to categorize and manage your proxies based on factors like location, type, or performance. This makes it easier to select the right proxies for specific scraping tasks.
Monitor Proxy Health: Stabler.tech may provide tools for monitoring the performance and health of your proxies, allowing you to identify and replace non-functional proxies quickly.

By following these steps, you'll have a well-configured Stabler.tech account ready to leverage the power of proxies for efficient and successful web scraping. Now you're all set to move on to the next step - building your proxy scraper!

Step 3: Building a Scraper Using Stabler.tech's Configuration Console

Stabler.tech's Configuration Console is where the magic happens. It provides a visual interface for building your web scraper using a drag-and-drop system of modules. Each module represents a specific action the scraper will perform, from navigating websites to extracting data and storing it.

Let's break down the process of building a proxy scraper step-by-step:

1. Access the Configuration Console:

From your Stabler.tech dashboard, navigate to the "Configurations" tab.
Click on the "New Config" button to create a new configuration.
Give your configuration a descriptive name (e.g., "Public Proxy Scraper").
Choose a proxy from your Proxy Farm. Stabler.tech offers integration with reputable proxy providers, or you can use your own.
Click "Create New Config."

2. Using the "Browse Static Pages" Module:

The first step is to instruct the scraper to visit the website containing the public proxy list. We'll use the "Browse Static Pages" module for this.
Drag and drop the "Browse Static Pages" module from the menu on the right onto the workspace.
In the module's text area, enter the URL of the website containing the proxy list.
For example:https://hidemy.name/en/proxy-list/?type=hs#list

Click the "Test Node" button (the play icon) to see the module in action. The left panel will display the website content, confirming that the scraper can access the page.

3. Using the "Extract & Browse Pages" Module:

Now, we need to extract the individual proxy entries from the list. For this, we'll use the powerful "Extract & Browse Pages" module.
Drag and drop the "Extract & Browse Pages" module onto the workspace.
Connect its "Previous Action" socket to the "Next Action" socket of the first "Browse Static Pages" entry module. This creates a flow, telling the scraper to first visit the website and then extract the links.
To extract the proxy information (IP address and port), you need to inspect the HTML structure of the proxy list website.
Right-click on the page in the left panel and select "Inspect" to open your browser's developer tools.
Use the developer tools to identify the HTML elements containing the proxy IP addresses and ports.
Once you've identified the correct HTML elements, you need to use CSS selectors to target them within the "Extract & Browse Pages" module.
For example, if the proxy IP addresses are contained within <td> elements with a class of "tdl", you would use the following selector in the "Selector" field of the "Extract & Browse Pages" module:td.tdl
Since we want to extract the text content of these elements, enter textContent in the "Attribute" field.
Repeat this process for the proxy port, identifying the correct selector and attribute.
Test the node to ensure the scraper is extracting the proxy IP addresses and ports correctly. The extracted data will appear in the "Data" tab of the left panel.

4. Data Transformation:

The extracted proxy data might contain unwanted characters or formatting issues. Stabler.tech allows you to clean and format this data using Javascript code.
Click on the "Open JS Editor" button (the Javascript icon) within the "Extract & Browse Pages" module.
In the editor, you have access to the extracted_content variable, which contains the raw extracted data. You can use Javascript functions to manipulate this data.
For example, to remove whitespace from the extracted proxy IPs, you could use the following code:return extracted_content.trim();
Similarly, you can use Javascript to format the extracted data into a more usable structure.
Save your Javascript code and test the node again to see the cleaned and formatted data.

5. Storage:

Finally, you need to choose how to store the extracted proxy list. Stabler.tech offers various options, including:
- Google Sheets: Easy to use and accessible for collaboration.
- MongoDB: A powerful database solution for larger datasets.
- S3 Storage: Secure and scalable cloud storage.
Drag and drop your preferred storage module (e.g., "Data StoreMongo") onto the workspace.
Connect its "Previous Action" socket to the "Next Action" socket of the "Extract & Browse Pages" module.
Configure the storage module by providing the necessary credentials and specifying the database/sheet where you want to store the data.
Test the connection to ensure the scraper can successfully push the data to your chosen storage location.

Now you've successfully built a scraper to extract public proxy lists using Stabler.tech! In the next step, we'll launch the scraper and see it in action.

Step 4: Launching and Managing the Scraping Job

With your proxy scraper fully configured, it's time to set it loose and watch it gather those valuable proxies. Stabler.tech makes launching and managing your scraper incredibly simple:

1. Launching Your Scraper:

From your Stabler.tech Configurations page, locate the proxy scraper configuration you've built.
Click the "Run" button (the play icon) associated with your configuration.
A pop-up window will appear, allowing you to schedule the scraper to run immediately or at a specific time.
For immediate execution, select "Now."
Click the "Run" button to confirm and initiate the scraping job.

2. Monitoring the Scraper's Progress:

Once launched, your scraping job will appear in the "Extraction Jobs" tab.
Click on the job name to access the job details page.
The job details page provides valuable information about the ongoing scraping process:
- Graph Status: A visual representation of your scraper's workflow, highlighting the status of each module (success, warnings, errors). Click on nodes for detailed logs.
- Extraction Job Metrics: Key metrics such as job status (running, finished, errors), progress, duration, quality (error rate), and speed (nodes completed per minute).
- Drillers Status: The status of your "drillers," the virtual machines executing the scraping tasks. This section indicates if drillers are actively scraping or idle.

3. Understanding the Logs:

Within the job details page, you can access detailed logs for each module by clicking on the corresponding node in the "Graph Status" section.
The logs provide a step-by-step account of the scraper's actions, including:
- URLs visited
- Data extracted
- Errors encountered
Carefully review the logs for any warnings or errors. These insights can help you troubleshoot issues and optimize your scraper for better performance.

4. Downloading the Extracted Proxy List:

Once your scraping job is complete (indicated by a "FINISHED" status), you can access the extracted proxy list from your chosen storage location.
Google Sheets: Access your Google Sheet and locate the sheet where the data was stored.
MongoDB: Use a MongoDB client (e.g., MongoDB Compass) to connect to your database and access the collection containing the proxy data.
S3 Storage: Access your S3 bucket and download the file containing the proxy data.

Additional Management Tips:

Pause or Stop Jobs: If needed, you can pause or stop ongoing scraping jobs from the "Extraction Jobs" tab.
Duplicate Configurations: Create copies of existing configurations to quickly set up similar scrapers with slight variations.
Schedule Regular Scrapes: Set up your proxy scraper to run periodically to ensure you always have a fresh supply of proxies.

By following these steps, you can efficiently launch, monitor, and manage your proxy scraping jobs on Stabler.tech. Next, we'll explore how to test and validate the proxies you've gathered to ensure they're reliable and ready for use.

Step 5: Testing and Validating the Scraped Proxies

You've successfully scraped a list of public proxies using Stabler.tech – congrats! However, your work isn't quite finished. It's crucial to test and validate these proxies to ensure they're functional, anonymous, and meet your specific requirements.

Here's a detailed guide to testing and validating your scraped proxies:

1. Functionality Check:

Online Proxy Checkers: Several websites offer free proxy checkers. Simply enter the proxy IP address and port into the checker, and it will attempt to connect through the proxy. Some popular options include:
- HideMyName Proxy Checker: https://hidemyname.io/en/proxy-list/
- Geopeeker: https://geopeeker.com/
- WhatIsMyIP.com: https://whatismyipaddress.com/
Manual Testing with Browser Extensions: Install a proxy management extension for your browser, such as FoxyProxy for Firefox or Proxy SwitchyOmega for Chrome. Configure the extension to use the proxy you want to test and visit a website. If the website loads without issues, the proxy is functional.

2. Anonymity Check:

IP Leak Tests: These tests check if the proxy is successfully masking your real IP address. Use one of the online proxy checkers mentioned above and look for the "Anonymity Level" or "Proxy Type." Aim for proxies with high anonymity levels, such as "Elite Proxy" or "Anonymous Proxy." These proxies hide your real IP address and provide the highest level of privacy.

3. Performance Testing:

Proxy Speed Tests: Measure the latency and bandwidth of the proxy to gauge its speed and performance. Several online tools can help you assess proxy speed:
- Speedtest.net: https://www.speedtest.net/
- Fast.com: https://fast.com/
Manual Testing: Visit websites or perform tasks that are representative of your scraping project while using the proxy. Observe the loading times and overall responsiveness to assess its suitability for your needs.

4. Filtering and Refining Your Proxy List:

Based on your testing results, filter out proxies that are non-functional, slow, or have low anonymity levels.
Create separate lists for different purposes, such as high-speed proxies for demanding scraping tasks and anonymous proxies for tasks requiring greater privacy.

5. Regular Re-Testing:

Public proxies can become unreliable quickly. Re-test your proxy lists regularly, especially before starting new scraping projects.
Consider automating the testing process using scripts or tools to save time and effort.

Additional Tips:

Country-Specific Proxies: If your scraping tasks require proxies from specific countries, filter your proxy list accordingly.
Proxy Type: Different types of proxies (HTTP, HTTPS, SOCKS) are suited for different tasks. Consider the requirements of your scraping project when selecting proxies.

By meticulously testing and validating your scraped proxies, you'll ensure the success and reliability of your web scraping projects. Stabler.tech provides the tools to efficiently gather these proxies, but it's your careful validation that turns them into valuable assets for your data extraction efforts.

Conclusion

Scraping public proxy lists can be a game-changer for your web scraping projects, providing a cost-effective way to access a constant pool of IP addresses. Stabler.tech simplifies this process, offering an intuitive platform for building, launching, and managing your scraping tasks. From identifying reliable sources to testing and validating the results, Stabler.tech empowers you to take control of your proxy needs and elevate your web scraping capabilities.

Don't stop here! Explore Stabler.tech's rich set of features, including advanced data extraction techniques, anti-bot bypass mechanisms, and integrations with various storage options. The world of web data awaits - unlock its potential with Stabler.tech.