Staff Site Reliability Engineer - Monitoring and Observability Worldwide
InVision is the digital product design platform used to make the world’s best customer experiences. We provide design tools and educational resources for teams to navigate every stage of the product design process, from ideation to development. Today, more than 5 million people use InVision to create a repeatable and streamlined design workflow; rapidly design and prototype products before writing code, and collaborate across their entire organization. That includes 100% of the Fortune 100, and organizations like Airbnb, Amazon, HBO, Netflix, Slack, Starbucks and Uber, who are now able to design better products, faster.
Our team is in search of a Staff Site Reliability Engineer - Monitoring and Observability to help us change the way digital products are designed.
About the Team:
The Observability team is part of the SRE organization and is responsible for developing, deploying, and operating the systems for collecting/storing/visualizing metrics, distributed logging, monitoring, alerting, and tracing. You will work closely with our engineering teams to design and build the next generation of systems monitoring infrastructure, availability, performance, and efficiency at scale.
What you’ll do:
- Lead a team of senior engineers responsible for reliability and performance standards
- Perform deep dives into system and latent reliability issues, service performance, and capacity modeling of distributed systems at scale; work across the organization to produce and roll out fixes
- Identify opportunities to improve automation; scope and create automation for deployment, management, and visibility of our services
- Analyze complex problems in the application space relating to resilience
- Create operational tooling for monitoring and self-healing infrastructures
- Code and participate in code reviews primarily written in Golang and Node
- Help guide architectural decisions and direct solutions that enhance our product reliability
- Partner with development to identify anti-patterns and create fallback experiences to critical scenarios
What you’ll bring:
- Experience building and deploying monitoring and observability systems
- Prefer experience with statsd, Datadog, New Relic, Prometheus, Grafana
- 2+ years of experience with Golang, Node
- 2+ years of experience with enterprise level infrastructure designs, implementation, and support
- 2+ years of experience working in an AWS environment
- 2+ years of experience with application monitoring tools
- 2+ years of hands on Kubernetes experience
- You are an experienced developer and comfortable with Golang and Node
- A degree in computer science, software engineering, a related field, or equivalent work experience
- Systematic problem solving approach coupled with a strong sense of ownership and drive
- A passion for creating performant and reliable applications
InVision offers an incredibly unique work environment. The company employs a diverse team all over the world. Each InVision team member is given the freedom and tools to do their best work from wherever they choose.
The benefits we offer in the United States and Canada include competitive health plans and retirement plans. Some InVision-wide benefits offered to all employees across the globe include a flexible vacation policy, monthly coffee shop stipends, annual allowances for books related to your profession, and home office setup & wellness reimbursements. InVision is an international employer so some benefit offerings will vary from country to country.
InVision is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know.
Vacancy page : https://boards.greenhouse.io/invision/jobs/1792758