AWS Investigates Perplexity for Allegations of Web Scraping Techniques in AI Model Training

Amazon Web Services probes if Perplexity employs ‘web scraping’ for AI training

Amazon Web Services (AWS) is currently investigating Perplexity, a company that uses its servers to train their Artificial Intelligence (AI) models, for allegations of web scraping techniques. Web scraping is the process of collecting content from web pages using software that extracts HTML code and filters information for storage.

Recent reports by developer Robb Knight and Wired revealed that Perplexity had violated the Robots Exclusion Protocol on certain websites by using web scraping to train their AI models. The Robots Exclusion Protocol involves placing a robots.txt file on a domain to indicate which pages should not be accessed by robots or automated crawlers.

In response to these allegations, AWS has launched an investigation to ensure that Perplexity is complying with all rules and regulations while using their services to train AI. Perplexity has stated that they respect robots.txt and their services do not violate AWS’s terms of service, except in rare cases where the bot ignores the file to retrieve specific information as requested by the user.

Wired has confirmed that their investigation aligns with Perplexity’s explanation, stating that the company’s chatbot does ignore robots.txt in certain cases to collect unauthorized information. AWS requires its customers to comply with their terms of service and applicable laws, and they will take appropriate action if any violations are found during the investigation.

Leave a Reply