Data Engineer Ireland

Company: Scrapinghub

About Scrapinghub:
Founded in 2010, Scrapinghub is a fast growing and diverse technology business turning web content into useful data with a cloud-based web crawling platform, off-the-shelf datasets, and turn-key web scraping services.
We’re a globally distributed team of over 130 Shubbers working from over 30 countries who are passionate about scraping, web crawling, and data science.

As a new Shubber, you will:
Become part of a self-motivated, progressive, multi-cultural team.
Have the freedom & flexibility to work remotely.
Have the opportunity to go to conferences and meet with the team from across the globe.
Get the chance to work with cutting-edge open source technologies and tools.

About the job:
Scrapinghub is looking for a Data Engineer to work closely with our Data Scientist team and provide assistance with dataset collection, cleaning and post-processing. You’ll also help with writing tools for working with data as required for Data Science projects. You will work with one of the most advanced and comprehensive web crawling and scraping infrastructures in the world, leveraging massive data sets with cutting edge technology.

Due to business requirements, the successful candidate must be based in Ireland. 
Candidates within the EU and willing to relocate will also be considered. 

Job Responsibilities:

    • Create data tools for Data Science team members that assist them in building and optimizing our product.
    • Assemble large datasets that meet requirements set by the Data Science team, including creating web crawlers.
    • Be proactive in bringing forth new ideas and solutions to problems
    • Be a strong team player and share knowledge freely and easily with your co-workers
    • Write software for post-processing and cleaning of the data, taking part in data analysis if required
    • Automate manual processes, optimize data delivery, improve architecture for greater scalability
    • Work on integration of Data Science components into our larger systems
    • Handle mid-size and large datasets (200GB+)

Job Requirements:

    • Python experience (3+ years)
    • 5+ years of software development experience
    • Good command of Linux
    • Front-end development experience required for creating and supporting internal tools
    • Back-end development experience required for creating and supporting internal tools: Python web frameworks (like twisted, aiohttp, django, flask), databases.
    • Understanding of the web technologies: JavaScript, HTML, CSS, HTTP
    • Strong analytics skills related to working with unstructured datasets
    • Excellent written English

Bonus points for:

    • Strong web crawling and web scraping skills: Scrapy knowledge, browser automation experience. Splash experience is a plus.
    • Experience handling mid-size and large datasets, organizing their parallel processing
    • Good spoken English
    • Strong record of open source activity

Vacancy page :