Docker

Webscraping project structure

As the amount of personal webscraping projects increased, I realized the importance of having an structured project template. The reason being that it allows to systematically create new projects without having to worry about refactoring later on if it doesn’t match with my webscraping service. Because of this, I have decided to explain in this post how I structure most of my webscraping projects: 1. Considerations I like to keep each project as independent from the rest as possible....

Setting up a webscraping service

As a data engineer is quite common to be on the lookup for useful (or at least interesting) websites to scrape data from. However, it is quite common to have different projects to start the same (e.g. downloading the HTML from the internet) but to end up processing them (ETL) in different ways. Because of this, I have setup a simple Raspberry Pi as my centralized webscraping service where: Websites, files, … can be downloaded into a raw format....

Setting up my own Home lab: Software

With the homelab hardware set up, it is time to start preparing the machines so they can work together. As I intend to use my homelab as a safe and easy to use “IT playground”, I have decided that the following elements would need to be setup: Static IP address on each machine. This facilitates troubleshooting as well as easing the communication between the machines. Remote control software and SSH. This facilitates troubleshooting my “remote” machines without having to hook a keyboard+mouse+monitor to them....