Data Scientist Worldwide
Founded in 2010, Scrapinghub is a fast growing and diverse technology business turning web content into useful data through use of our open source projects, such as the Scrapy web crawling framework. We’re a globally distributed team of over 120 Shubbers who are passionate about scraping, web crawling and data science.
As a new Shubber, you will:
Become part of a self-motivated, progressive, multi-cultural team.
Have the freedom and flexibility to work remotely.
Have the opportunity to go to conferences and meet with the team from across the globe.
Get the chance to work with cutting-edge open source technologies and tools.
About the job:
We are looking for experienced individuals who are passionate about data science and enjoy working in a collaborative environment. You will get the chance to work with one of the most advanced and comprehensive web crawling and scraping infrastructures in the world, leveraging massive data sets with cutting edge technology.
Due to business requirements, only candidates based in Ireland will be considered.
- You will apply your data science and engineering skills to create products based on machine learning, analyze large volumes of complex data, model challenging problems, and develop algorithms to solve our internal and client needs.
- You will work and experiment with state-of-the-art web crawling, machine learning and data processing technologies. Some of the problems you’ll be working on include object detection, text classification, named entity recognition, crawling algorithms.
- You will work in collaboration with other data scientists and engineers across Scrapinghub to design and build creative solutions to challenging problems.
- You will work on projects that span the whole organization, including areas such as Product and Professional Services.
- Strong machine learning background (natural language processing, computer vision, deep learning, “classical” methods)
- Hands-on experience in Data Science projects (data preparation, target metrics, model evaluation, validation, etc.)
- Strong software development skills, ideally in python.
- Experience with any of these tools is a plus: pytorch, scikit-learn, tensorflow, pandas, jupyter, spacy, gensim, vowpal wabbit, crfsuite, scrapy, spark, AWS, docker, kafka.
Please send source code that shows your programming ability well and a link to your Kaggle profile
If you have many projects on github (or similar) please tell us which we should look at.