Senior Backend Engineer for Cloud Services Worldwide
Founded in 2010, Scrapinghub is a fast growing and diverse technology business turning web content into useful data through use of our open source projects, such as the Scrapy web crawling framework.
We’re a globally distributed team of over 120 Shubbers who are passionate about scraping, web crawling and data science.
As a new Shubber, you will:
Become part of a self-motivated, progressive, multi-cultural team.
Have the freedom to work from wherever you want
Have the opportunity to go to conferences and meet with the team from across the globe.
Get the chance to work with cutting-edge open source technologies and tools.
About the job:
In Scrapinghub, we are developing a next-generation platform for automatic crawling and extraction - a combination of using the state-of-art machine learning technology and scaling it up with microservices.
The platform would be used directly by our customers via API, as well as by ourselves for internal projects. So far our extraction capabilities include automated product and article extraction from single pages, and we plan to expand it to support whole-domain, and also support more page types like jobs and news. The service is still in early stages of development, serving its first customers.
Our platform has several components communicating via Apache Kafka. Most components are written in Python, with a few components implemented with Scala and Kafka Streams. The current priorities are improving reliability and scalability of the system, integrating with other Scrapinghub services, and adding new features like auto-scaling. This is going to be a challenging journey for any good backend engineer!
Apply and join an excellent team of engineers and data scientists, including one of the world’s top-ranked kaggle masters!
- Design and implementation of the backbone of a large scale web crawling and extraction platform.
- Interaction with data science engineers and customers
- Write code carefully for critical and production environments along with good communication and learning skills
- Good knowledge of Python
- Understanding what CPU/memory effort the particular code requires
- Experience with any distributed messaging system (Rabbitmq, Kafka, ZeroMQ, etc)
- Docker containers basics
- Linux knowledge
- Good communication skills in English
- Understand a ways to solve problem, and ability to wisely choose between: quick hotfix, long-term solution, or design change
Bonus points for:
- Kafka Streams and microservices based on Apache Kafka, understanding Kafka message delivery semantics and how to achieve them on practice
- Understanding how web works: research on link structure, major components on link graphs
- Algorithms and data structures background
- Experience with web data processing tasks: web crawling, finding similar items, mining data streams, link analysis, etc.
- Experience with Microservices
- Experience with JVM