News Feed Compiler
This project compiles a news feed from various news outlets for the last two weeks. It utilizes Python with Flask framework and stores the data in a SQLite database. The news feed is generated by scraping articles from popular news websites, including CNN, BBC, Reuters, The Guardian, AP News, and Wikipedia News.
Requirements
Before running the project, ensure you have the following software installed on your system:
- Python 3.x
- Poetry
Installation
- Clone the repository to your local machine.
- Install the dependencies using Poetry:
poetry install
- Run everything.py to generate everything.csv and then run init_db.py to create the news_websites sqlite database:
Running the Project
To compile the news feed from the last week, follow these steps:
- Execute the main Python flask script:
poetry run python news_web.py
Note: The news.py
file contains the link scraper function, and individual news outlet scrapers are stored in separate files (e.g. BBC.py
, CNN.py
, Reuters.py
, TheGuardian.py
, APNews.py
, WikiNews.py
). The data is stored temporarily in CSV files (e.g. news_week.csv
and theconversation.csv
) before being merged into everything.csv
.
Disclaimer
This project is intended for educational purposes only. The scraping of news websites may violate the terms of service of those sites. Always review and comply with the terms of service of any website you intend to scrape. Additionally, excessive scraping can put a strain on the servers of the target website, leading to unintended consequences. Use this project responsibly and with the utmost respect for the websites you are accessing.# News Feed Compiler
This project compiles a news feed from various news outlets for the last week. It utilizes Python with Flask framework and stores the data in a SQLite database. The news feed is generated by scraping articles from popular news websites, including CNN, BBC, Reuters, The Guardian, AP News, and Wikipedia News.
Requirements
Before running the project, ensure you have the following software installed on your system:
- Python 3.x
- Poetry
Installation
- Clone the repository to your local machine.
- Install the dependencies using Poetry:
poetry install
- Run everything.py to generate everything.csv and then run init_db.py to create the news_websites sqlite database:
Running the Project
To compile the news feed from the last week, follow these steps:
- Execute the main Python flask script:
poetry run python news_web.py
Note: The news.py
file contains the link scraper function, and individual news outlet scrapers are stored in separate files (e.g. BBC.py
, CNN.py
, Reuters.py
, TheGuardian.py
, APNews.py
, WikiNews.py
). The data is stored temporarily in CSV files (e.g. news_week.csv
and theconversation.csv
) before being merged into everything.csv
.
Disclaimer
This project is intended for educational purposes only. The scraping of news websites may violate the terms of service of those sites. Always review and comply with the terms of service of any website you intend to scrape. Additionally, excessive scraping can put a strain on the servers of the target website, leading to unintended consequences. Use this project responsibly and with the utmost respect for the websites you are accessing.