Fulltime Site Reliability Engineer openings in Chicago, United States on September 09, 2022

Site Reliability Engineer at Classkick

Location: Chicago

The Role

Classkick is growing, and so is the Platform team! This new role at Classkick is an opportunity to build nearly from scratch the capabilities required to keep our high-traffic, realtime, distributed platform healthy and available.

We’re seeking a Site Reliability Engineer that’s dreamed about having the chance to own the whole picture. If you’re excited by the prospect of defining, implementing, growing, and owning system-wide reliability, observability, and related automation, look no further. With your ownership you’ll build the capabilities necessary to measure what matters, act on findings, define our SLOs, and collaborate widely across the company to proactively support healthy scaling. Your impacts will ensure our more than 700,000 daily active students and educators across the world are happy and productive for years to come.

What you’ll do
• Proactively assess and address system reliability gaps and issues.
• Build out and grow comprehensive monitoring/alerting for all aspects of the platform.
• Automate tasks that can be automated.
• Improve standards, tools, processes relevant to the facets of system reliability.
• Collaborate in designing our future systems: provide expert guidance towards highly available, scalable, fault-tolerant implementations.
• Define engineering standards which will support your work: app-level instrumentation & logging, etc.
• Take part in on-call rotation.
• Participate in incident management: collaborate in root cause analysis, remediation, and prevention.

Outcomes you’ll meet
• Assess and report on the current state of Classkick reliability/observability monthly: identify and prioritize needs at each platform grooming.
• Collaborate with the VPE and Platform team in defining a reliability roadmap for the next 6 months.
• In your first 60 days, determine present state SLOs and define SLO goals for 6 months out.
• In your first 60 days, define and realize platform reliability KPIs.

Why you’ll be awesome
• You’re a self-starter, self-directed learner, love learning and growing.
• You’re great at what you do, and pride yourself on doing it with quality.
• You have an eye on timelines and competing priorities, and know when to make tradeoffs.
• You are a proactive and effective communicator.
• Efficiency and clarity make you happy, and you strive for them.
• You love working in a supportive atmosphere of amazing people that treat you like a human first.
• Must haves:
• Experience as an SRE in a cloud environment at scale (i.e. near or well above 700k daily active users).
• Experience leading or overseeing large projects from start to finish.
• Are passionate about improving education.
• Have a strong desire to perform and grow as an engineer.
• Nice to haves
• Familiarity with Google Cloud Platform.
• Experience in DevOps.
• Experience with chaos engineering, and/or load testing at scale.
• Experience working with distributed system components (for example caching, job-queuing, messaging, or other).
• Familiarity with any of the following programming languages: Python, Ruby, Scala, Java.

Classkick’s stack: iOS, Angular, React, Python, Ruby, Scala, GCP, Firebase, MySQL, Kubernetes. Prior familiarity with stack is a bonus. It’s NOT required.

About Our Team

Classkick is a tech startup that loves helping teachers and students. We love it so much that every month, our team in-person or virtually joins classrooms to see how teachers and students use Classkick in real life. We enjoy hearing feedback and gaining more insight into how our product is valued in classrooms around the world.

Ed-tech is still young. We believe that by building a great product with a great business model, we will create a lasting impact on student learning, teacher effectiveness, and the industry at large. Our customer-centric culture helped us make the #1 learning activities platform in K-12 (measured by mean session time and/or NPS), with organic growth in 170+ countries.

Our team has described our culture as values-driven, collaborative, transparent, and empowering. We are lucky to serve students all over the world, and we don’t take that responsibility lightly. We are committed to diversity and building an equitable and inclusive environment. Our products will only benefit all students if built by people with a diverse set of backgrounds, experiences, and opportunities. We especially encourage members of traditionally underrepresented communities to apply for open roles, including women, historically excluded people of color, LGBTQ+ people, veterans, and people with disabilities.

And, while this is a startup and everyone works very hard, we believe in a work-life balance. More at classkick.com, instagram.com/classkick or twitter.com/classkick.

Classkick’s Core Values
• Advocate for Learners
• Focus on People
• Work Towards Justice
• Stay Curious
• Dream

Why this opportunity matters

Every day, kids come to class feeling engaged, supported and excited to learn because of Classkick. Our goal is that every student in the world is happy and successful in their education.

To achieve this, we connect students and their school work to all the advocates in their learning: classmates, teachers, parents, anyone invested in them. When the right person is there at the right time – whether with a high five, or some help when stuck – that’s when you get more “Ah ha!” and “OH I got it!” moments. Learning transforms from a chore into a fun and intrinsic experience.

After the pandemic, we suddenly expanded from being a major tool in the classroom, to becoming the classroom itself, for both in-person and hybrid learning. Joining at this stage is pivotal and will have high impact on the future direction of the company.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Site Reliability Engineer at Morningstar, Inc.

Location: Chicago

In this role, you will play a vital part on the operations team for our next generation of data services artifacts. Under the leadership of the Operations Manager, you will be responsible for creating and supporting processes that ensure reliable and timely data delivery to Morningstar products.

In order to rapidly delivery business value, you will also adopt and support cutting-edge data concepts such as data lake, data analytics and distributed data calculation to fuel our data development process. Moreover, you will leverage engineering skills and operational insights to establish and advocate operational excellence and collaborate with diverse teams to contribute to initiatives that brings data products and services operations to the next level.

Job Responsibilities
• Help lead the corporate operations management initiatives based on best practices such as CI/CD, monitoring everything, infrastructure automation, operations readiness review.
• Build world class data operations by establishing ITIL deployment, problem / incident management and continuous improvement processes.
• Provide on-call technical triage and troubleshooting by understanding and analyzing financial data systems.
• Support data systems request fulfillment such as access management, ETL configurations.
• Lead miscellaneous operation projects across teams such as DR, security compliance, AWS resources management.
• Drive automation and innovation for proactive and continuous operations improvement by new technology research and tools development.
• Be a focal communication contact to collaborate with our oversee offices for projects, knowledge transfer and on-call rotation.

• Bachelor’s degree or higher.
• Experience in Python or other scripting languages.
• Experience with AWS: S3, SNS, SQS, DynamoDB, Glue, Spark.
• Knowledge of monitoring, alerting and deployment tools: Splunk, CloudWatch, New Relic, Jenkins/Cloudbees, Cloud Formation.
• Knowledge of DevOps, ITIL is a plus.
• Experience of operating data warehouse, data lake related products is a plus.
• Excellent interpersonal, organizational and communication skills.

Nice to have
• AWS Certification(s)
• ITIL Certification

001_MstarInc Morningstar Inc. Legal Entity

Morningstar and its subsidiaries are an equal opportunity/affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, ancestry, religion, sex, national origin, age, disability, protected veteran status, marital status, sexual orientation, genetic information, citizenship, gender identity, parental status, or other legally protected characteristics or conduct.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Site Reliability & DevOps Engineer at VMware

Location: Chicago

As part of the Workspace ONE FedRAMP Access and Intelligent Hub service team, you will help build, deploy, and operate the Workspace ONE Access and Intelligent Hub SaaS services platform hosted on AWS GovCloud. You will work with our team to continuously improve how we rapidly and securely deliver high-quality software services to VMware’s US Local, State, and Federal Government customers, working with a variety of AWS-managed services (eg, RDS Aurora, DynamoDB, Elasticsearch, SQS, SNS, CloudFormation, Lambda) in addition to deploying containerized micro-services. You will also have the opportunity to collaborate with the Access and Intelligent Hub Engineering team to help introduce improvements to the product across the full software development lifecycle, including identifying opportunities to reduce operational burdens and increase observability and monitoring through code and automation.

If you enjoy working with automation to continuously improve the delivery and operation of critical, secure SaaS services to support US Government customers, then this is the role for you.

Success in the Role: What are the performance outcomes over the first 6-12 months you will work toward completing?

Within your first 6 months:
• You will be expected to contribute towards SRE & DevOps-related automation efforts.
• You will be contributing to the incident response process, and work towards improving the stability of underlying software components with an automation mindset
• You will assume ownership of 1 or more DevOps/SRE functions by becoming a subject matter expert at it and working across engineering teams. Examples of functions are CI, CD, Infra-as-code, Observability, Security, Compliance, Incident response automation, etc.

After 6 months+:
• You will be expected to design and propose improvements in existing cloud operations and SaaS delivery models for AWS-hosted cloud services.
• You will be expected to design and propose next-generation DevOps/SRE solutions to support and sustain at higher scale requirements, work towards the proof concept, and bring adoption of newly developed solutions across engineering teams.
• You will embody automation first mindset and will act as a guiding force across DevOps, SRE, and engineering teams to achieve the 100% automation goal
• You will be responsible for improving the reliability and resiliency of microservices by enforcing DevOps/SRE best practices across engineering org
• You will adopt and evangelize a mindset of full-stack observability, implement monitoring, logging, and tracing functions across all DevOps/SRE functions to provide a system where production incidents are prevented by detecting and fixing things before it each production system.

The Work: What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis?

As a new member of the DevOps and SRE Team, you will:
• Work with the cloud team, service team, and product management to review and refine production readiness requirements
• Deploy microservices to AWS GovCloud using Continuous Integration and Continuous Deployment (CI/CD) systems
• Write and review automation supporting system deployment and operations
• Participate in our on-call rotation, providing operational support to components owned by DevOps and SRE teams
• Design and create the feedback loop to provide information from our running systems back to development to help improve performance, security, scalability, and feature usage
• Identify and implement improved feedback loops early in the development process to fix issues before they reach production.
• Design, document, and hold reviews of proposed technical solutions
• Research and help select the technology upon which our next-gen deployment platform will be based
• Participate in all team scrum meetings and demos
Apply Here
For Remote Site Reliability & DevOps Engineer roles, visit Remote Site Reliability & DevOps Engineer Roles


Senior Site Reliability Engineer, Tock at Squarespace

Location: Chicago

The Gig

The Tock engineering team is looking for a site reliability engineer to help us support the next generation of restaurant bookings based on the system we’ve deployed everywhere from local dive bars to Alinea, The French Laundry, and Eleven Madison Park. Our engineering team was founded in 2015 by ex-Google engineers and combines FAANG team quality with the speed and personal impact of a startup. As a Site Reliability Engineer, you will work with the rest of our Engineering team to ensure our products and infrastructure are reliable, fast, efficient, and secure with an eye to reducing toil.

This is an opportunity for you to work with world-class engineers and SREs on challenging problems while having a big impact on the hospitality industry. As a member of a growing team, you’ll help define the next stage of our growth.

Tock Engineering takes a hybrid approach to work. Qualified engineers can choose to work from our offices, from home, or a combination of the two.

You will report directly to the Senior Director of Engineering at Tock.

You’ll Get To…
• Work in our complex production environment and seek out ways to simplify systems and reduce toil.
• Accelerate our adoption of Config as Code and Infrastructure as Code.
• Troubleshoot production incidents and debug across distributed systems and at multiple layers (including network, system, and application).
• Define and measure Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to help teams make informed decisions about balancing reliability against engineering velocity.
• Participate and contribute in a culture of direct, compassionate feedback.
• Contribute to team culture.
• Shape the evolution of a large system.

Who We’re Looking For
• At least 5 years of experience in an SRE or DevOps role
• Experience supporting HA products / infrastructure on public clouds using Kubernetes (AWS / GCP). We use GCP.
• Familiarity with using Config as Code and Infrastructure as Code to manage configuration and cloud infrastructure. We use Terraform, Ansible, and Helm.
• An understanding of all parts of a modern web-based application stack including frontend, backend, database, and networking.
• Fluency with one or more general purpose programming languages which could include: Java, Go, Bash, Python, Ruby, JavaScript, or TypeScript.
• Familiarity with security best practices.
• You are someone who:
• Works well independently
• Enjoys being part of a team that learns together
• Values making and building things as part of a growing team
• BA/BS or greater in computer science or a related field, but not required

Benefits & Perks
• Health insurance with 100% premium covered for you and your dependent children
• Flexible vacation & paid time off
• 401k with employer match
• Paid parental leave
• Fertility and adoption benefits
• Education reimbursement
• Commuter benefit in the form of pretax (US)
• Employee Assistance Program
• Backup Dependent Care
• Subscription to mindfulness app, Headspace
• Charitable donation match

Tock’s Growth
• Awarded Fast Company’s “Most Innovative Companies” in 2021
• Awarded Built In’s “Best Places to Work” in 2020, 2021, and 2022
• Awarded America’s Hottest Brands of 2020 by AdAge
• Won Chicago Tribune’s “Game Changer” Award for industry innovation
• Reached a global customer base of 30 countries operating in 200+ cities
• Processed over $1 billion in prepaid reservations
• Named one of 2019’s 50 Startups to Watch
• Featured in: New York Times, Bloomberg, GQ, Vice, Wired, Food & Wine, Eater, Skift Table, Chicago Tribune, Crain’s Chicago Business, New York Post, and more

About Tock

Tock is the all-in-one system for reservations, takeout, delivery, and events. We are changing the way restaurants, wineries, and culinary event organizers run their business and how guests explore, discover, and book at these places all around the globe.

About Squarespace

Squarespace is a leading all-in-one website building and ecommerce platform that enables millions to build a brand and transact with their customers in an impactful and beautiful online presence. Our suite of products enables anyone at any stage of their journey to manage their projects and businesses through websites, domains, ecommerce, marketing tools, and scheduling, along with tools for managing a social media presence with Unfold and hospitality business management via Tock. Squarespace democratizes access to best-in-class design, helping our customers in approximately 200 countries and territories maintain consistent branding across all digital touchpoints to stand out online. Our team of more than 1,400 is headquartered in downtown New York City, with offices in Dublin, Ireland, Portland, Oregon, Los Angeles, California and Chicago, Illinois. For more information, visit

Our Commitment

Not only do we embrace and celebrate the diversity of our customer base, but we also strive for the same in our employees. At Tock, we are committed to equal employment opportunity regardless of race, color, ethnicity, ancestry, religion, national origin, gender, sex, gender identity or expression, sexual orientation, age, citizenship, marital or parental status, disability, veteran status, or other class protected by applicable law. We are proud to be an equal opportunity workplace.
Apply Here
For Remote Senior Site Reliability Engineer, Tock roles, visit Remote Senior Site Reliability Engineer, Tock Roles


Site Reliability Engineer at Interactive Brokers LLC

Location: Chicago

Join the Interactive Brokers Team Interactive Brokers Group has been consistently at the forefront of trading innovation, starting with the invention of the first floor-based handheld computer in 1983 and we pride ourselves on being primarily a technology company. We continue to challenge the status quo and push boundaries to offer the best trading platform with the most sophisticated features all for the lowest cost to our customers. Software development is the lifeblood of our firm, and it shows in our stellar brokerage platform. We offer award-winning desktop, mobile and web applications which provide our clients with the tools they need to be successful. Interactive Brokers Group, Inc. (IBKR); is rated 1 – Best Online Brokers 4 years in a row by Barron’s , Best Online Brokers – Barron’s Award (read more). About the role – As a global technology leader in Financial Services, IBKR maintains tens of thousands of individual IT components and millions of dollars of infrastructure supporting the business. Inside our global IT operations centers, these systems, networks, processes and infrastructure are monitored 24/7/365 ensuring platform stability and proper function. We are searching for an SRE / IT Operations Engineer to work within the technical operations group, to support our technical operation analysts through automation and tooling. Your Responsibilities Take ownership of software tooling and configuration management software and infrastructure. Maintain and improve reliability of production services by developing innovative monitoring and scaling solutions to measure system health and automate resilience. Continuously improve hybrid, on-premise and cloud infrastructure to support development teams throughout the full service lifecycle. Support teams across the organization with connectivity configuration and file delivery automation using software development, networking and cyber security knowledge. Second level support for incident management across the brokerage system and creating corrective action plans through collaboration with problem managers using post-mortem best practices. Key Requirements Bachelor of Engineering or equivalent relevant technical experience in Computer Science, Software Engineering, Mathematics, Physics or similar. Experience developing applications in any of the following languages: Python, C++, Java or Kotlin Preferred Practical background with Linux based systems and networking. Experience working with and managing configuration for monitoring / alerting systems (Prometheus, Grafana, Kibana, ElasticSearch, Logstash, AlertManager). Infrastructure configuration and management with AWS and supporting cloud technologies (CloudWatch, Terraform, Kubernetes, Lambdas/Functions). Experience with DevOps technologies such as Docker / Docker-Compose, software build/packaging systems (Gradle, SetupTools, CMake, Make), dependency management (pip, maven, NPM etc.) Knowledge of file transfer protocols (SFTP, FTP), certificate management and modern encryption standards. Experience working in a software development team using supporting development tools (Jira, Git, Gitlab, Jenkins) and best practices (test coverage, unit & integration testing, linting). Experience with ITIL best practices and collaboration tools such as Jira, Confluence and ServiceNow. Company Benefits & Perks Competitive salary, annual performance-based bonus and stock grant Retirement plan 401(k) with competitive company match Excellent health and welfare benefits including medical, dental, and vision benefits Wellness screenings and assessments, health coaches and counseling services through Employee Assistance Program (EAP) Paid time off and a generous parental leave policy Daily company paid lunch and a fully stocked kitchen with healthy options for breakfast and snack Corporate events including team outings, dinners, volunteer activities and company sports teams Education reimbursement and learning opportunities Modern offices with multi-monitor setups LI-SB1 Company Overview Interactive Brokers LLC, a subsidiary of Interactive Brokers Group, Inc. (Ticker: IBKR) is a direct access electronic broker serving professionals, frequent traders, institutional investors, financial advisors and introducing brokers. Our clients have access to more than 150 market centers around the world from a single integrated account. Our employees are part of a dynamic, multi-national, fast-paced, results-oriented team that has spent four decades focused on advanced technology and automation that equips our clients with a uniquely sophisticated platform to manage their investment portfolios. We provide our clients with advantageous execution prices, risk and portfolio management tools, research facilities and investment products, at low or no cost, positioning them to achieve superior returns. Headquartered in Greenwich, CT, USA. IBKR has offices in more than 15 countries across the world. IBKR is a member of NYSE, FINRA, and SIPC. Interactive Brokers Group brokerage affiliates are regulated by securities and commodities agencies around the world. Click the link to view a short video with a few words from current Interactive Brokers employees: https://www.interactivebrokers.com/en/index.php?f31899
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Site Reliability Engineer SRE at Request Technology

Location: Chicago

Salary Information:120-150K + BonusReference #:CJ-SREnjTravel:nullVisa Requirement:US Citizenship / Permanent ResidentRecruiter Email:craig@requesttechnology.comRecruiter Name:Craig JohnsonLocation Type:Berkeley Heights, New JerseyOverview
• **We are unable to sponsor for this permanent full-time role***
• **Position is bonus eligible***

Prestigious Fortune 500 Company is currently seeking a Site Reliability Engineer. Candidate will be providing operational support for the products to meet SLOs and SLAs.


Working closely with development teams to implement and improve SLIs and SLOs for their services.
Identifying and developing processes, tools, automation, infrastructure improvements and software changes to address top operational issues.
Exerting technical influence to shape the implementation of products and establishing strong operational readiness across teams.
Utilizing hands-on technical skills to partner with team members and be comfortable diving into the fray as needed.
Diagnosing complex problems, developing metrics to measure them, and implementing monitoring solutions to manage them.
Building automation and systems to maintain software and hardware lifecycle management.
Using your programming experience to reduce toil.

5+ years in a Reliability Engineering, DevOps, or Infrastructure focused role
Strong Experience scripting with a scripting language – Python, PowerShell, Javascript
Passion for designing and building reliable systems
Automation advocate – Have an automation first approach
Experience with deploying, supporting, and monitoring new and existing services, platforms, and application stacks
Strong experience supporting customer-facing applications on Windows/Linux platforms.
Monitoring experience leveraging Splunk, ExtraHop, Prometheus, etc…
Strong fundamental understanding of Networking and Security
Knowledge of TCP/IP networking, architecture, and core technologies (such as DNS, DHCP, HTTPS).
Excellent communication skills, written and verbal, to share your knowledge, teach what you know, and learn new ways of doing things from your team.
Preferred Skills:

Demonstrated experience building or maintaining highly available systems at scale.
Experience with CI/CD pipelines that support a SaaS product.
Experience with capacity planning practices or methodologies.

Tagged as: bash, container engineer, devops engineer, docker, kubernetes, linux, linux administrator, linux engineer, linux systems administrator, python, shell, site reliability engineer, SRE, unix
Apply Here
For Remote Site Reliability Engineer SRE roles, visit Remote Site Reliability Engineer SRE Roles


Site Reliability Engineer at LAVORO

Location: Chicago


As an integral part of our Global Cloud & Platform Team, you are an SRE who can help our in-house technology teams build, support, and improve their suite of applications and systems. You will help to optimize the path to production and provide the automation skills to build monitoring, alerting and self-healing capabilities into our applications and services. You must be passionate about finding and resolving problems, no matter where they occur in the stack, and one who strives to make sure they don’t happen again.

• Support production and UAT systems
• Perform some 2nd line support and triage
• Resolve 3rd line issues
• Take an issue and remediate, follow it through to production
• Management of Continuous Integration environments
• Help support pipeline tools such as Nexus, Jenkins and SonarQube
• Managing Application & Development Environments
• Across a mixture of physical and cloud-based environments
• Develop systems
• Work on fixes and improvements of existing systems
• Work on shared tools, libraries, and platforms
• Increase maturity in Continuous Delivery
• Working with development teams, guide them to use engineering best practices and ensure their applications meet the Company Engineering and Company security standards
• Recommend changes to engineering teams based on best practice
• Optimizations
• Database usage patterns
• Data modelling
• Work on shared knowledge, scripts and tools
• Create documentation to drive learning and identify repeatable actions which can be automated
• Facilitate Technical Roadmaps
• Be involved in upgrade and maintenance work across systems
• Resolve technical blockers and liaise with Infrastructure teams
• Mentor junior members of the team
• Ensure that all customers/stakeholders are treated fairly in line with Company’s principles on Customer Experience, Everyday Matters, Our Strategy, Employee Engagement, Continuous Improvement and TCF policy


Skills Required
• Continuous Delivery
• Coding best practices
• Agile methodology (Scrum)
• Software testing principles
• Technical specialism within the Team’s selection of technology
• Software Architecture
• Information Security
• Must be able to demonstrate a collaborative approach with all members of the team

Essential Skills
• Strong Java skills (Java 8+)
• Linux and networking fundamentals
• Experience of containerization, ideally using Docker
• Experience of continuous integration tooling, ideally using Jenkins, Maven, Nexus & SonarQube
• Experience with AWS; EC2, RDS(Aurora), Elasticache and other managed services
• An appreciation of Java / JVM-based application internals and debugging
• Monitoring with Prometheus & Grafana or similar
• Experience across the entire stack: hardware, application, security and network.
• Experience with continuous deployment

Desired Skills
• Supplementary languages (.NET, Python, Go, Julia)
• Knowledge of container orchestration using Kubernetes and Openshift
• Experience of agile software development methodologies and environments

• Ability to work to deadlines & be delivery focused
• Determination to find problems and fix them properly
• Must enjoy building scalable/resilient software
• Must be passionate about automation and self-healing
• Willingness & aptitude to learn new skills and to contribute to the team’s continuous improvement ethic
• Must demonstrate a self-directed, self-motivated and pro-active attitude
• Must have strong communication & presentation skills
• Flexible attitude to working hours and home working
• Ability to work with multiple departments, including IT Security, Infrastructure, Development and Production Services
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Senior Site Reliability Engineer at BDO

Location: Chicago

Senior Site Reliability Engineer – Remote Chicago Illinois 27368BR Job Summary BDO’s Core Purpose is Helping People Thrive Every Day. Our Core Values reflect how we manage our work, our relationships and ourselves. As an employee of the firm, you will live true to our Core Values of people first, being exceptional every day in every way, embracing change, feeling empowered through knowledge and choosing accountability. Our Core Values are the standards by which we conduct ourselves day in and day out, both internally and externally.BDO Digital, LLC, a subsidiary of BDO USA, LLP, provides a holistic portfolio of technology, transformation services and solutions. We are an organization that values your time, talent, and contributions. Collaborate with BDO Digital’s cross-disciplinary team who work together to solve digital needs and unearth new opportunities to drive competitive advantage. Our commitment to each other is why BDO Digital is a recognized leader for our culture, employee satisfaction and career growth. We’re looking for people with the same drive; to combine teamwork with technology to produce amazing results.The Senior Site Reliability Engineer will work with cutting edge technologies across various industries to deliver world-class support and operational stability to our clients. This role will work side by side with our custom application development specialists to build holistic observability, alerting, and self-healing capabilities into every application we deliver. The Senior Site Reliability Engineer is also responsible for technical oversight of a team of junior Site Reliability Engineering (SRE) resources and training them in best practices.As there are several cities listed within this job description, we would consider candidates nationwide and able to work remote as long as willing to travel to client site as needed.Essential Duties/FunctionsSite Reliability: – Continuously improves the production experience for several BDO clients- Establishes and enforces observability standards across custom application development projects- Adds telemetry and monitoring to application code according to standards and best practicesAutomation: – Facilitates with application developers to create self-healing capabilities for common cloud issuesIncident Management: – Collaborates with the application team and managed services team to proactively manage client incidents- Participates in Root Cause Analysis investigation and authoringImplementation and Support: – Software deployments and ongoing software support- Non-business hours support oversight, as required by clients- Other duties as requiredDevelopment of Others- Provides technical oversight of a team – Provides coaching and training to develop team members Qualifications Education- Bachelor’s degree from an accredited university, required – Major with focus in Computer Science, preferred Experience- Four (4) or more years of experience with below scripting languages, required- Two (2) or more years of experience with Application Performance Monitoring (APM) tools, required- Professional experience optimizing and maintaining cloud native deployments, required- Some professional experience working with software application development, required- Ability to write and optimize SQL queries, requiredSoftware:- Experience with one of the following in each category, required: – Programming: C#, Java, Scala, Kotlin, .NET – Scripting Languages: Python, PowerShell, Ruby, Perl – Performance Monitoring: Azure Application Insights, Azure Monitor, DataDog, ELK, Dynatrace, or any other APM tool – Source control: GitHub, Gitlab, Azure Repos, or any other Source control – Continuous Integration/Development (CI/CD) Pipeline experience: GitHub Actions, Azure Pipelines, Gitlab Runners, or any other CI/CD tool – Cloud Native Deployments: Familiarity with Azure, AWS, GCP or any other Cloud tool, preferredOther Knowledge, Skills & Abilities- Experience within a consultative environment- Strong written and verbal communication skills- Must be open to travel to client sites, if neededKeywords: SRE, site reliability, python, DevOps, PowerShell, Monitoring, Observability, DataDog, ELK, Dynatrace, AMP, AWS, Cloud, Azure, GCP, GitHub, GitLab, Pipelines, Runners, C#, .Net, Java, Python, Ruby, Perl, Application Insights Multiple Locations Akron, Anchorage, Atlanta, Austin, Baltimore, Boca Raton, Boston, Charlotte, Cherry Hill, Cincinnati, Cleveland, Columbia, Columbus, Columbus (BSC), Coral Gables, Dallas, Des Plaines, Detroit, Fort Lauderdale, Fort Lauderdale 301, Fort Worth, Gardner, Grand Rapids, Grand Rapids (BSC), Greater Philadelphia, Greater Washington D.C – Potomac, Greater Washington D.C. – McLean, Greenville, Harrisburg, High Point_BDO Collections, Houston, Indianapolis, Jacksonville, Kalamazoo, Lakeland, Laramie, Las Vegas, Las Vegas – 6100 Elton, Long Island, Los Angeles, Madison, Madison 8383, McLean, Memphis, Miami, Miami – Brickell, Milwaukee, Minneapolis, Nashville, New York, New York 600, New York 622, Norfolk, Oak Brook, Orange County, Orlando, Philadelphia, Phoenix, Pittsburgh, Raleigh, Reno, Richmond, Rosemont, Salt Lake City, San Antonio, San Diego, San Francisco, San Jose, San Ramon, Seattle, Seattle 601, Spokane, St. Louis, Stamford, Tampa Bay, Tulsa, Valhalla, Washington, DC, West Palm Beach, Wilmington, Winter Haven, Woodbridge Senior Site Reliability Engineer – Remote | BDO USA
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Senior Site Reliability Engineer- Remote at HealthEquity, Inc.

Location: Chicago

We are looking for a passionate Senior Site Reliability Engineer to join our team in Draper, Utah. Our team is responsible for driving scalable architecture, minimizing risks, providing visibility across a multitude of environments, systems and applications while using lean principles at scale in a fast-paced environment. Youll contribute to the design and documentation of systems, in collaboration with scrum teams, looking for opportunities to automate away waste. Youll work with scrum teams to troubleshoot complicated systems and applications and will partake in an on-call rotation.

What you’ll be doing
• Work with teams to design and implement automated code deployment solutions
• Work with teams to design and implement automated environment provisioning and container solutions
• Work with teams to design and implement application monitoring and alerting solutions to get issues to the right people at the right time
• Work with teams to remediate issues that impact the health and performance of our production systems and infrastructure
• Work with teams to diagnose and isolate issues at all layers of the stack, whether it be code or infrastructure, during development and in production
• Manage build definitions and hardware in support of our Continuous Delivery policies and procedures

What you will need to be successful
• Bachelors degree in CS/Engineering or equivalent experience
• 8+ years experience in a DevOps, SRE, or IT Operations position
• 2+ years experience writing SQL queries and Stored Procedures
• 2+ years experience developing in .NET and C#
• Demonstrated interpersonal skills and ability to collaborate with product owners and development teams
• Demonstrated ability to context switch while still delivering on commitments
• Ability to troubleshoot complex systems and environments
• Experience with CI/CD concepts and tooling
• Knowledge of full stack monitoring concepts and tooling from code to system resources
• Experience with containerization design concepts and tooling

Benefits & Perks
• Medical, Dental, Vision
• HSA contribution and match
• Dependent Care FSA match
• Unlimited Paid Time Off
• 401(k) match
• Paid Parental Leave
• Ongoing Education?& Tuition Assistance
• Gym/Fitness Reimbursement
• Award Winning Wellness Program

Come be your authentic self

Why work for HealthEquity

HealthEquity has a vision that ? by?2030 we will make HSAs as wide-spread and popular as retirement accounts. ? We are passionate about providing a solution that allows American families to ? connect health and wealth . Join us and discover a work experience where the person is valued more than the position. Click here to learn more.

Come be your authentic self

HealthEquity, Inc. is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, age, color, religion, sex, sexual orientation, gender identity, national origin, status as a qualified individual with a disability, veteran status, or other legally protected characteristics. HealthEquity is a drug-free workplace
Apply Here
For Remote Senior Site Reliability Engineer- Remote roles, visit Remote Senior Site Reliability Engineer- Remote Roles


Senior Site Reliability Engineer at Infusionsoft

Location: Chicago

We are the Keap Site Reliability team and we’re looking for a Senior Site Reliability Engineer (SRE) to help build automation for networked systems to increase the simplicity, consistency, security, and availability of our Keap platform. We’re looking for someone that’s passionate about helping small business succeed and who enjoys building tools and systems for scaling web and SaaS infrastructure.

The Work
• Building automation for networked systems to increase simplicity, consistency, security, availability, and scalability
• Automating software delivery to cloud
• Building and testing tools to help build new web based software/hardware environments
• Creating and configuring monitoring and metrics
• Deploying and monitoring releases of code to systems
• Create an environment of end to end ownership where teams deploy and monitor
• Evaluating current and proposed compute platforms for high availability and scalability
• Working with developers, systems architects and engineers to build new SaaS products for Keap

• Experience in automating cloud infrastructure
• Experience building continuous delivery pipelines
• Expertise in Linux system administration
• Experience designing, developing and deploying A+ provisioning and automation systems for web Servers at scale
• Strong operations experience supporting systems in public and private cloud deployments
• Strong system and tools development skills (bash, python, ruby, golang, java)
• Experience using tools to deploy and deliver software and systems (Terraform, Packer, Ansible, Docker, Vagrant, Puppet)
• Loves technical challenges and analyzing problems to create solutions

Ideally, You Possess
• BS in Computer Science/Information Technology or 3-5 years of experience supporting IT/Operations or Development systems
• Operational experience with code repositories and versioning (Git, svn, github, perforce)
• System administration skills with relational and non-relational databases (MySQL, Cassandra, ElasticSearch, HBASE, redis, memcached)
• Solid understanding of virtualization principals, architectures, deployment and administration (vmware, zen, kvm)
• Solid understanding of Storage, Networking and Compute systems (GCP, GCS, GCE, VPC, AWS, S3)
• Strong cloud hosting experience, Google Cloud Platform is a plus
• Ability to write tools to interface with Storage, Networking and Compute CLI and API interfaces
• Experience supporting a highly available environment

Challenges You Can Help Us Face
• How do we deploy code anywhere with relative ease?
• How do we scale private and public clouds to support Keap SaaS growth?
• How can we make system administration work push button easy?

About Keap

In 2001, Keap (formerly Infusionsoft) pioneered the sales and marketing automation category for small business. Today, Keap is the #1 CRM platform in its category. Were 400+ strong and seeking talented and intelligent people to help us on our mission of helping grow small businesses worldwide.

At Keap, we celebrate diversity and inclusion for the benefit of our employees, our products, our community, and to help small businesses succeed. We do not discriminate based on race, color, ethnicity, ancestry, national origin, religion, sex, gender, gender identity, gender expression, sexual orientation, age, disability, veteran status, marital status or any legally protected status.

Legal authorization to work in the U.S. is required. Keap will not sponsor new candidates for employment visas, now or in the future, for this job opening.
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


The Tech Career Guru
We will be happy to hear your thoughts

Leave a reply

Tech Jobs Here