Fulltime Site Reliability Engineer openings in Boston on September 06, 2022

Site Reliability Engineer at Motion Recruitment Partners LLC

Location: Boston

• Full Time | $90k – $140k | Boston, Massachusetts | September 1st, 2022
• Title: Site Reliability Engineer
• Location: Boston, MA
• I am excited to be posting this role for a Site Reliability Engineer at a Boston SaaS company.
• This position will be for the first SRE this company has hired.
• This will be a development heavy SRE role where you would be focusing on back-end software development and scaling their cloud services.
• This company is a well-established software company in Boston that sells a Health Data Analytics platform to clients such as Pfizer.
• This software streamlines the processes that health professionals go through every single day.
• This program can train, test, and deploy explainable models quickly and easily
• Required Skills & Experience
• Bachelor’s degree in Computer Science, Engineering, Math, or related technical/science field
• 2-5 years of software engineering experience with programming in Python, Ruby, or similar
• 1-2 years of SRE, DevOps, or Operations experience
• Significant experience with Linux system administration
• Pragmatic technologist/generalist with an open-mind to work on different types of projects
• Software engineers with interest or experience in system automation
• Desired Skills & Experience
• Previous experience managing and monitoring operational systems
• Previous experience working with Spark, Cassandra, Alluxio or ElasticSearch
• What You Will Be Doing
• 50% Python OR Java and Ruby
• 50% Team Collaboration
• Paid Time Off (PTO)
• Applicants must be currently authorized to work in the US on a full-time basis now and in the future.
• Posted By: Stephen Calandra
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Senior Site Reliability Engineer at Radancy

Location: Boston

The Radancy Programmatic Product and Engineering team is seeking a proven Senior SRE that is motivated by increasing team efficiency, team collaboration, and overall getting quality work done every day.

Studies have shown that women and people of color are less likely to apply for jobs unless they believe they meet every one of the qualifications in a job description. Our top priority is finding the best candidate for the job and if you are interested in the position, we would encourage you to apply, even if you don’t believe you meet every one of the qualifications below.
Manage production assets for programmatic and data teams.
Responsible for batching, upgrading, deploying new servers
Prioritize, troubleshoot and mitigate issues
Partner with CTO to manage team workload and priorities
Accountable to ensure uptime, compliance and performance monitoring activities, ensure on-time delivery of quality products and services to all customers
Support engineering efforts in partnership with engineering and product leadership
Investigate system alarms and notifications
Maintain excellent, timely communication with the operations, product and engineering teams
Proactively stay up to date with all the latest technologies concerning Radancy’s Programmatic products and the underlying technologies
Contribute documentation to build the programmatic knowledge base

3-5 years of coding experience, ideally in JS and Python
Experience with SQL and shell scripting
Bachelor’s degree in a STEM discipline
3-5 years of minimum experience as a Support Engineer in a data driven software product
Strong writing, communication and presentation skills
Must have exceptional soft skills, including the ability to articulate to both technical and business audiences
Experience managing a geographically distributed development team
Ability to work in a dynamic and flexible environment
Experience with web development and AWS a plus
Experience with JIRA a plus
Join the global leader in talent acquisition technologies that’s committed to finding new ways to leverage software, strategy and creative to enhance our clients’ employer brands – across every connection point. We’re looking for unconventional thinkers. Relentless collaborators. And ferocious innovators. Talented individuals who are ready to work towards solutions that transform the way employers and job seekers connect.
Radancy is an equal opportunity employer and welcomes all qualified applicants regardless of race, ethnicity, religion, gender, gender identity, sexual orientation, disability status, protected veteran status, or any other characteristic protected by law. We actively work to create an inclusive environment where all of our employees can thrive
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Lead Site Reliability Engineer at Embark Veterinary

Location: Boston

Who we are:

Join Embark on our mission to improve the life and longevity of dogs everywhere. Our canine DNA test – named the best by The New York Times – enables us to make scientific advances in personalized pet care. Embark was listed in Forbes’ Next Billion Dollar Startups’, and with over 1 million dogs tested, we continue to make amazing paw-gress towards bettering the lives of dogs.

Join our pack! At Embark, our People First culture is centered around building an amazing team and giving everyone an opportunity to have a voice and make an impact. We support DEI&B, your continued growth & development, work-life balance, and are committed to making Embark the best place you’ve ever worked!

About the role:

As the technical leader of the pipeline team, you will develop the technical roadmap and architecture to improve the various components of our bioinformatics pipeline. Our bioinformatics pipeline is at the center of Embark’s core product offering and is built to help customers better understand and care for their dogs. This pipeline transforms raw genetic data into a dog’s breed mix, ancestry, dog relatives, health conditions and traits, and more! The components of the pipeline must run efficiently, reliably, and at a massive scale on AWS: thousands of nodes, hundreds of terabytes of data processing. You will work closely with the science teams to ensure that our reference architecture, observability tooling, testing frameworks and CI/CD systems enable us to quickly deliver new products to our customers!

What you’ll do:
• Optimize our use of AWS systems (RDS, S3, SQS, EKS, and more) to improve pipeline reliability and efficiency
• Lead cross-team conversations to ensure clear requirements from multiple stakeholders
• Lead team in discussions to define, share, and implement the long-term technical vision for the pipeline.
• Mentor engineers in better software engineering practices
• Keep on top of best practices and evolving trends in technologies the team uses
• Advocate for technical improvements
• Innovate with current and new technologies which will improve our current pipeline and observability stacks
• Design and build a reference architecture utilizing containers for multiple teams to use in a plug-n-play pipeline
• Build out our CI/CD stack, including improving our testing frameworks or implementing new ones
• Design, build and maintain core infrastructure as code (IaC)
• Implement logging, monitoring, and alerting across websites, batch processing systems, and other AWS-based services to detect and prevent production issues

What experience we’re looking for:
• 6+ years as a DevOps/SRE engineer
• Experience leading a team or major projects within a larger team
• Strong planning and communication skills – able to document out a deliverable plan in a way that others understand and trust
• In-Depth, hands-on knowledge of AWS infrastructure: S3, EKS, CloudFormation, RDS, Lambda. Able to debug issues in the infrastructure and configuration.
• Experience with concepts such as microservices (including scalability best practices), containers, and container orchestration using Docker and/or Kubernetes
• Experience implementing CI/CD workflows in systems like CodeDeploy, GitHub Actions, or Bamboo.
• Experience implementing log aggregators and observability platforms
• Passionate about making lives easier for developers by enhancing tooling and alerting
• Working knowledge of several scheduling tools, can speak to real-world examples of where they work well and where they fall short
• Proficient in using Infrastructure as Code (IaC) with both yaml and json
• Proficient in scripting, utilizing Python and bash
• Significant experience implementing data pipelines
• Able to quickly learn and make decisions regarding new frameworks and technologies.
• Willingness to participate in on-call rotation including some weekends and holidays

Research suggests that up to 60% of those identifying as under-represented individuals might have talked themselves out of applying at this point. Don’t worry about checking all the boxes. At Embark, we are committed to diversity, equity, inclusivity and belonging. We are focused on hiring team members who are excited to learn and grow with us, so please-introduce yourself and let us know about you!

What we can offer:

At Embark, we might be dog lovers, but we’re passionate about people too. We’re committed to building an inclusive culture where all employees can belong and flourish. Here are some of our benefits and perks:
• A flexible vacation policy so you can take off the time you need when you need it
• Paid maternal and paternal leave (plus paw-ternity leave for new pets)
• Every other Friday off each summer
• Dog-friendly office near South Station, Boston (when we get back to the office), with some flexibility (eg work from home at least 2 days a week, flexibility around child care needs, etc)
• Perks tailored for dog lovers including subsidized pet insurance and dog-walking services
• Startup perks with big-company benefits (401k match, a generous bonus structure, commuter benefits, top-of-the-line healthcare, HSA/FSA)
• Competitive salaries and equity participation – every employee gets stock options
• Fully-stocked office snack bar and regular office events
• New iMacs and MacBook Pros, or laptops running Linux
• Continuing education, including attending conferences and company-provided resources to help with individual growth and development

Embark is an equal opportunity workplace and values diversity at our company. We are committed to equal employment opportunities regardless of race, color, ancestry, religion, sex, national origin, citizenship status, sexual orientation, age, disability status, marital status, gender identity or expression, veteran status, or any other characteristics protected by federal, state or local laws. See also EEO is the Law.
Apply Here
For Remote Lead Site Reliability Engineer roles, visit Remote Lead Site Reliability Engineer Roles


Software Engineer / Site Reliability Engineer at SonicJobs

Location: Boston

UnitedHealth Group is a company that’s on the rise. We’re expanding in multiple directions, across borders and, most of all, in the way we think. Here, innovation isn’t about another gadget, it’s about transforming the health care industry. Ready to make a difference? Make yourself at home with us and start doing your life’s best work.(sm)

Optum Labs is the research and development arm of UnitedHealth Group. We’re a diverse team of curious thinkers and experts in big data, artificial intelligence and machine learning, and scientific and clinical research searching for new ways to help people live healthier lives. We partner with world leaders in health care delivery, research, and technology to create disruptive solutions that serve patients, caregivers, providers, and commercial and government payers.

You’ll enjoy the flexibility to telecommute* from anywhere within the U.S. as you take on some tough challenges.

Primary Responsibilities:
• Actively participate in on-call rotation for incident resolution for the platform and/or any dependent components which the product engineering teams rely on for their work. This will be no more than 25% of the time. The rest of the time will be automating and developing quality and operational improvements/solutions
• Maintain and improve operational tooling, frameworks, perform chaos engineering activities
• Perform root cause analysis and deliver resolution for tools and automation failures
• Build frameworks that test the performance and resiliency of our platform services/tools
• Build/integrate/administer systems and tools that enable engineering teams to observe their applications in production with autonomy (Dashboards, APMs)
• Automate alerts for metrics on performance, cost, vulnerabilities, risk, compliance violations
• Identify and measure SLOs, SLAs and SLIs
• Improve processes/runbooks and champion automation of any manual items around support

Youll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications:
• BA/BS in Computer Science, Engineering or related field or equivalent experience
• 3+ years developing cloud-native applications using one or more languages (Typescript, C# ,Java)
• 3+ years deploying and operating cloud-native applications in a public cloud (Azure preferred)
• 2+ years in a role of supporting software and/or cloud-infrastructure in an on-call rotation basis to help with identification and remediation of technical problems at the root cause
• In-depth and proactive communication skills around status of projects/issues in production
• Solid Git skills
• Full COVID-19 vaccination is an essential job function of this role. Candidates located in states that mandate COVID-19 booster doses must also comply with those state requirements. UnitedHealth Group will adhere to all federal, state and local regulations as well as all client requirements and will obtain necessary proof of vaccination, and boosters when applicable, prior to employment to ensure compliance. Candidates must be able to perform all essential job functions with or without reasonable accommodation

Preferred Qualifications:
• 3+ years implementing dashboards to help teams visualize logs, instrumentation, and other data to ensure optimal performance of the platform services, infra, and deployed applications(Grafana preferred)
• Experience with Docker and Kubernetes (Azure Kubernetes Service preferred)
• Experience using centralized logging solutions (Splunk (preferred), Elk, etc.)
• Experience using active monitoring systems (Datadog, New Relic, etc.)
• Experience creating runbooks, processes, and test plans around reliability, performance, etc. of infra/applications
• Experience planning and supporting +99.99% availability against critical applications in production

We Lead with Diversity, Inclusion and Compassion

At Optum Labs, we are dedicated to building teams where every individual is recognized for their unique experience and contributions. Our Leadership Principles underscore our commitment to inclusion, encouraging us to walk in each others shoes and open doors for our peers.

UnitedHealth Group supports local, regional, and national organizations that share these values through joint initiatives, event and program participation, volunteerism and giving. Through our Connected Communities, employees can connect with others who have similar – or different – life experiences and backgrounds. These groups are led by peers, supported by Human Capital and championed by leaders.]

We Invest in Talent

Managers at every level are committed to their roles as talent stewards who help guide and nurture professional development. We want our employees to reach their highest level of potential just as they help us reach ours. Join Optum Labs and youll be part of a culture that prizes innovation and works with uncompromising integrity.

At Optum Labs, employees are our first customers. Thats why we offer virtual work environments to provide work/life flexibility via telecommuting. While it can be a struggle to be a telecommuter, it can also provide enormous benefits for your personal and professional life.

To protect the health and safety of our workforce, patients and communities we serve, UnitedHealth Group and its affiliate companies require all employees to disclose COVID-19 vaccination status prior to beginning employment. In addition, some roles and locations require full COVID-19 vaccination, including boosters, as an essential job function. UnitedHealth Group adheres to all federal, state and local COVID-19 vaccination regulations as well as all client COVID-19 vaccination requirements and will obtain the necessary information from candidates prior to employment to ensure compliance. Candidates must be able to perform all essential job functions with or without reasonable accommodation. Failure to meet the vaccination requirement may result in rescission of an employment offer or termination of employment

Careers at UnitedHealth Group. We have modest goals: Improve the lives of others. Change the landscape of health care forever. Leave the world a better place than we found it. Such aspirations tend to attract a certain type of person. Crazy talented. Compassionate. Driven. To these select few, we offer the global reach, resources and can-do culture of a Fortune 5 company. We provide an environment where you’re empowered to be your best. We encourage you to take risks. And we offer a world of rewards and benefits for performance. We believe the most important is the opportunity to do your life’s best work.(sm)

Colorado, Connecticut or Nevada Residents Only: The salary range for Colorado residents is $66,100 to $118,300. The salary range for Connecticut / Nevada residents is $72,800 to $129,900. Pay is based on several factors including but not limited to education, work experience, certifications, etc. In addition to your salary, UnitedHealth Group offers benefits such as, a comprehensive benefits package, incentive and recognition programs, equity stock purchase and 401k contribution (all benefits are subject to eligibility requirements). No matter where or when you begin a career with UnitedHealth Group, youll find a far-reaching choice of benefits and incentives.
• All Telecommuters will be required to adhere to UnitedHealth Groups Telecommuter Policy.

Diversity creates a healthier atmosphere: UnitedHealth Group is an Equal Employment Opportunity/Affirmative Action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, national origin, protected veteran status, disability status, sexual orientation, gender identity or expression, marital status, genetic information, or any other characteristic protected by law.

UnitedHealth Group is a drug-free workplace. Candidates are required to pass a drug test before beginning employment.
Apply Here
For Remote Software Engineer / Site Reliability Engineer roles, visit Remote Software Engineer / Site Reliability Engineer Roles


Lead Site Reliability Engineer Opportunity with Boston, MA Based Pet SaaS Company at Motion Recruitment

Location: Boston

Location: Boston, MA (Hybrid Remote)
Title: Lead Site Reliability Engineer

Required Technology:
– Terraform
– Comfortable with Coding

Salary (% annual bonus):
Min: $160,000 base
Max: $180,000 base
Apply Here
For Remote Lead Site Reliability Engineer Opportunity with Boston, MA Based Pet SaaS Company roles, visit Remote Lead Site Reliability Engineer Opportunity with Boston, MA Based Pet SaaS Company Roles


Senior Manager, Software Engineering – Site Reliability Engineering (Remote Eligible) at Capital One

Location: Boston

Locations: VA – McLean, United States of America, McLean, VirginiaSr. Manager, Software Engineering – Site Reliability Engineering (Remote Eligible)

Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive , and iterative delivery environment? At Capital One, you’ll be part of a big group of makers, breakers, doers and disruptors, who love to solve real problems and meet real customer needs. We are seeking DevOps Engineers who are passionate about marrying data with emerging technologies to join our team. As a DevOps Engineer, you’ll have the opportunity to be on the forefront of driving a major transformation within Capital One.

What You’ll Do:
• Lead a portfolio of diverse technology projects and a team of developers with deep experience in machine learning, distributed microservices, and full stack systems
• Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, mentoring other members of the engineering community, and from time to time, be asked to code or evaluate code
• Collaborate with digital product managers, and deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment
• Utilize programming languages like Java, Python, SQL, Ruby and Go, Container Orchestration services including Docker and Kubernetes, CM tools including Ansible and Terraform, and a variety of AWS tools and services

Capital One is open to hiring a Remote Employee for this opportunity.

Basic Qualifications:
• Bachelor’s degree
• At least 8 years of experience in DevOps Engineering (Internship experience does not apply)
• At least 4 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
• At least 6 years of Unix or Linux system administration experience
• At least 4 years of experience in people management

Preferred Qualifications:
• 9+ years of DevOps Engineering experience
• 6+ experience with coding and Scripting (Python, SQL, Java, JavaScript, Golang, Bash, Perl or Ruby)
• 4+ years of experience in infrastructure design, implementation and delivery
• 3+ years of experience with monitoring tools (Splunk or Zabbix)
• 3+ years of experience with Container orchestration services including Docker or Kubernetes
• 3+ years of experience working with Agile Development Practices

Capital One will consider sponsoring a new qualified applicant for employment authorization for this position. No agencies please. Capital One is an Equal Opportunity Employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, physical and mental disability, genetic information, marital status, sexual orientation, gender identity/assignment, citizenship, pregnancy or maternity, protected veteran status, or any other status prohibited by applicable national, federal, state or local law. Capital One promotes a drug-free workplace. Capital One will consider for employment qualified applicants with a criminal history in a manner consistent with the requirements of applicable laws regarding criminal background inquiries, including, to the extent applicable, Article 23-A of the New York Correction Law; San Francisco, California Police Code Article 49, Sections 4901-4920; New York City’s Fair Chance Act; Philadelphia’s Fair Criminal Records Screening Act; and other applicable federal, state, and local laws and regulations regarding criminal background inquiries.

If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1-800-304-9102 or via email at (see below) . All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.

For technical support or questions about Capital One’s recruiting process, please send an email to (see below)

Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.

Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe and any position posted in the Philippines is for Capital One Philippines Service Corp. (COPSSC).
Apply Here
For Remote Senior Manager, Software Engineering – Site Reliability Engineering (Remote Eligible) roles, visit Remote Senior Manager, Software Engineering – Site Reliability Engineering (Remote Eligible) Roles


Lead Site Reliability Engineer at Federal Reserve Bank of New York

Location: Boston

• Federal Reserve Bank of BostonThe Federal Reserve is developing a new interbank 24x7x365 real-time gross settlement (RTGS) service with integrated clearing functionality, called the FedNow service.
• This service will help enable financial institutions provide their customers with the ability to send and receive payments any time, any day, and have full access to those funds within seconds.
• This position is a unique opportunity to be part of a new mission-critical Federal Reserve initiative that will be transformative to the payments landscape in the United States.
• A requirement of this position is that the employee must be fully vaccinated against COVID-19 or qualify for an accommodation from the Bank’s vaccination policy; individuals who are unable to be vaccinated due to a medical condition or sincerely held religious belief may request an accommodation from the Bank.
• Operate the production environment for the program.
• You will implement, and leverage solution monitoring and tooling to be used for capacity planning, utilization reporting, and scaling.
• The team uses open source and proprietary software to support Engineering, DevOps, and DevSecOps tools, services, and solutions.
• The SRE / Production Operations team is part of the Technical Operations (TechOps) department and has the overall responsibility for the design, management and execution of operations required to support the ongoing technical and delivery needs of the FedNow Program, as well as the transition to production support and operations.
• This team interfaces with internal stakeholders, customers for planning, delivery, and service management.
• It owns ongoing ITIL processes, and the implementation and driving of continuous improvement initiatives.
• You will work closely with Engineers and Architects of the FedNow program in order to maintain seamless automation across the entire platform.
• The ideal candidate is someone who loves building and maintaining reliable and scalable systems, CI/CD tooling, and automating cloud-based highly available, high performing applications.
• What will be expected of you-
• Provide expertise to the Engineering, DevOps, and QA teams.
• Leverage SRE best practices.
• Work with open-source technologies as needed.
• Work with CI and CD tools, and source control such as GIT and SVN.
• Lead the team through continuous improvement of production operations.
• Offer technical support where needed and developing automation software to speed incident resolution.
• Stay current with industry trends and source new ways for our business to improve.
• Implement, configure, and operate tools and products in the DevOps and DevSecOps Toolchain.
• Building and maintaining tools, services, and automations associated with deployment and our operations platform, ensuring that all meet our customer service standards and reduce errors.
• Actively troubleshoot any issues that arise in production.
• Update our processes/documentation and design new tools and processes as needed.
• Deploy product updates as required while implementing integrations when they arise.
• Automate our operational processes as needed, with accuracy, and compliant with security standards.
• Expertise you will bring-
• Extensive knowledge and understanding of working in AWS environments & services.
• Experience supporting infrastructure for large multi-services applications.
• Experience working with continuous deployment in micro-services architectures.
• Proficiency in scripting languages.
• Experience working with Linux.
• Experience with APM tools.
• Experience working with configuration management tools.
• Working knowledge of databases.
• Develop and maintain environment documentation and support procedures.
• Continually advance technical knowledge and skills.
• Ensure compliance with Federal Reserve policies and standards.
• Expertise in test automation and tooling.
• Knowledge of operating systems (Linux, Unix, Windows).
• Knowledge of database testing.
• Knowledge of technology project and secure coding standards.
• PLEASE NOTE: Position is responsible for 24×7 support for some systems.
• As such, the Lead Site Reliability Engineer is required to occasionally work off-hours for maintenance or to resolve technical issues.
• Logistics & Requirements-
• Bachelor’s degree in Computer Science, Information Systems, or equivalent background or experience.
• Candidate must possess demonstrated experience with job duties outlined in this description.
• 3+ years of supporting production cloud environments.
• Server administration certification and experience preferred.
• Strong analytic and problem solving skills.
• Self-motivated individual with the ability to prioritize and manage changing priorities.
• Strong customer service skills.
• Independent critical thinking and decision-making abilities.
• Excellent written and oral communication abilities.
• The Federal Reserve System is committed to a diverse and inclusive workplace and to provide equal employment opportunities to all persons without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, age, genetic information, disability, or military service.
• All employees assigned to this position will be subject to FBI fingerprint/ criminal background and Patriot Act/ Office of Foreign Assets Control (OFAC) watch list checks at least once every five years.
• The above statements are intended to describe the general nature and level of work required of this position.
• They are not intended to be an exhaustive list of all duties, responsibilities or skills associated with this position or the personnel so classified.
• While this job description is intended to be an accurate reflection of this position, management reserves the right to revise this or any job description at its discretion at any time.
• For this job, any offer of employment is contingent upon successfully passing a two-phase security screening.
• The first phase consists of the satisfactory completion of a physical examination (including a drug screening), reference checks, and a security investigation consisting of credit and criminal history checks.
• The second phase, which might not be complete until after you begin working at the Reserve Bank, is an additional risk-based security screening determined by the risk rating of the position.
• Depending upon the sensitivity of the position, this phase may include, and is not limited to, work and residency eligibility verification, and personal interviews with the candidate, references, and prior employers.
• All applicants must have resided in the United States for at least three (3) years
• First (United States of America)
• The Federal Reserve Banks believe that diversity and inclusion among our employees is critical to our success as an organization, and we seek to recruit, develop and retain the most talented people from a diverse candidate pool.
• The Federal Reserve Banks are committed to equal employment opportunity for employees and job applicants in compliance with applicable law and to an environment where employees are valued for their differences.
Apply Here
For Remote Lead Site Reliability Engineer roles, visit Remote Lead Site Reliability Engineer Roles


Site Reliability Engineering Manager – Velocity at Klaviyo

Location: Boston

At Klaviyo, we value the unique backgrounds, experiences and perspectives each Klaviyo (we call ourselves Klaviyos) brings to our workplace each and every day. We believe everyone deserves a fair shot at success and appreciate the experiences each person brings beyond the traditional job requirements. If you’re a close but not exact match with the description, we hope you’ll still consider applying.
Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering group is to provide services, tooling, and guidance to Klaviyo’s product engineers to make them more productive and ensure their services are sufficiently reliable, scalable, and secure. The Velocity team makes it really easy for engineers at Klaviyo to develop applications, regardless of size, as well as safely deploy to production, such that there is one way of deploying our custom application code at Klaviyo. This deployment model enables hundreds of small pushes a day. Additionally, automation and standards exist for every step of developer life such that product engineers can self-service their containerized, cloud-based local environments as well as answer their development questions. How You’ll Make a Difference
• Manage 3-6 Site Reliability Engineers in Klaviyo’s Boston office and remotely.
• Help individuals on your team develop and execute SMART goals and personal development plans that align with Klaviyo’s goals and objectives, and understand how their work fits into the bigger picture.
• Interview, hire, and level up the Velocity Engineering team.
• Work with the team on project planning and defining milestones, identifying dependencies, and meeting business goals.
• Participate in deep system design and implementation discussions within your team and across partner teams to ensure that we’re building the right systems and keeping quality high.
• Level up the team through hands-on coaching and individual contribution. This includes pairing with direct reports to design, write, and deliver software to improve the scalability, reliability, and security of Klaviyo’s systems.
• Iterate and improve upon engineering-wide processes like recruiting, onboarding, performance management, communication, and Agile software development.
Who You Are
• Successfully led and delivered infrastructure projects spanning multiple quarters and involving input from multiple external stakeholders.
• Experience coaching and growing Site Reliability Engineers.
• Experience developing and rolling out engineering-wide processes.

Get to Know Klaviyo

Klaviyo is a world-leading marketing automation platform dedicated to accelerating revenue and customer connection for online businesses. Klaviyo makes it easy to store, access, analyze and use transactional and behavioral data to power highly-targeted customer and prospect communications. The company’s hybrid customer-data and marketing-platform model allows companies to grow by fostering direct relationships with customers, without giving up their valuable data to popular big-tech ad platforms. Over 265,000 innovative companies like Unilever, Custom Ink, Living Proof and Huckberry sell more with Klaviyo. Learn more at www.klaviyo.com .

If you are a Colorado resident and this role is a remote role, you can receive additional information about the compensation and benefits for this role, which we will provide upon request. Requests can be submitted here. Additional information regarding benefits can be found here.

Klaviyo is committed to diversity and to a policy of equal employment opportunity and non-discrimination. We do not discriminate on the basis of race, color, religion, national origin, age, sex, marital status, ancestry, physical or mental disability, veteran status, gender identity, sexual orientation or any other characteristic protected by applicable law.
Apply Here
For Remote Site Reliability Engineering Manager – Velocity roles, visit Remote Site Reliability Engineering Manager – Velocity Roles


Site reliability engineer at Clear Ventures

Location: Boston

Toast is driven by building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best : building the businesses they love.

At Toast, our Site Reliability Engineers (SREs) are responsible for keeping all customer-facing services and other Toast production systems running smoothly.

SREs are a blend of pragmatic operators and software craftspeople who apply sound software engineering principles, operational discipline, and mature automation to our environments and our codebase.

Our decisions are based on instrumentation and continuous observability as well as through predictions and capacity planning.

About this roll* (Responsibilities)
• Automate collection and analysis of metrics from distributed systems to assist in performance tuning and fault finding
• Create sustainable systems and services through automation, triage & feedback
• Build strong partnerships with development teams to improve services through rigorous testing and release procedures
• Lead system design consulting, platform management, and capacity planning
• Balance feature development speed and reliability with well-defined service level objectives
• Lead sustainable incident response and blameless postmortems

Do you have the right ingredients*? (Requirements)
• Polyglot technologist / generalist with a thirst for learning
• Deep understanding of cloud and microservice architecture, and the JVM
• Experience with tools such as APM, Terraform, Ansible, GitHub, Jenkins, Docker
• Experience developing software or software projects ideally utilizing Java
• Extensive and broad industry experience with at least 8 years of engineering experience and a recent in-depth focus on SRE and / or DevOps roles
• Effective leadership & communication skills to be able to provide technical leadership on large scale projects

Our Spread* of Total Rewards
• Unlimited Vacation
• Sabbatical opportunity after five years
• Professional Development Reimbursement Program
• Commitment to Employee Wellness through resources such as a quarterly Wellness Stipend
• Various peer and company recognition programs
• 401(k) and matching
• Medical, Dental, & Vision Coverage
• Mental Health Benefits
• Subsidized backup childcare
• Bread puns encouraged but not required

We are Toasters

Diversity, Equity, and Inclusion is Baked into our Recipe for Success.

At Toast our employees are our secret ingredient. When they are powered to succeed, Toast succeeds.

The restaurant industry is one of the most diverse industries. We embrace and are excited by this diversity, believing that only through authenticity, inclusivity, high standards of respect and trust, and leading with humility will we be able to achieve our goals.

Baking inclusive principles into our company and diversity into our design provides equitable opportunities for all and enhances our ability to be first in class in all aspects of our industry.

Bready* to make a change? Apply today!

Toast is committed to creating an accessible and inclusive hiring process. As part of this commitment, we strive to provide reasonable accommodations for persons with disabilities to enable them to access the hiring process.

If you need an accommodation to access the job application or interview process, please contact email protected .

Last updated : 2022-09-06
Apply Here
For Remote Site reliability engineer roles, visit Remote Site reliability engineer Roles


Site Reliability Engineer at MassMutual

Location: Boston

Do you want to be part of a team that encourages your growth, supports your ambitions and makes it a priority for you to reach your goals? Is helping people part of who you are? At MassMutual, we help millions of people find financial freedom, offer financial protection and plan for the future. We do this by building trust with our customers by being knowledgeable problem solvers and prioritize their needs above all else. We Live Mutual.Description​​Technical Design, Development & Problem Solving1. Highly seasoned professional with specialized skills/knowledge, that is recognized across the organization for their expertise.2. Experience to independently develop algorithms and deliver results in production environments at scale.3. Ability to take on proof of concepts to proof new technologies 4. Problem solve with little to no guidanceExecution & Delivery1. Ability to independently execute complex, long term projects2. Publish, patent, and/or open source novel work.3. Optimizing the delivery approach (is it being delivered well, is it being delivered with quality and is the team using methodology)4. Understand context of the work rather than just focusing on what is assigned.5. Excellent communicator and collaborates well with people on the team and in other organizations6. Innovate and is open to ideas to improve efficiencies and works with the team to pivot if necessary.Software Engineering Excellence1. Responsible for code excellence and quality of the project being assigned, assisting lead in suggestions & design as needed.2. Research on tools being used by the department and provide suggestions for improvements3. Enhance continuous delivery and integration processes for the team4. Creates reusable code frameworks or example projects to aid developers on the team.5. Advocates and leads initiatives for enhancing our testing frameworksLeadership1. Mentor core developers2. Partner with business stakeholders and product managers to define project timelines and deliverable at each project stage.3. Awareness and involvement in identifying the resources which would be required for project delivery. Being able to identify resources from other departments. May not take ownership of these communications, but assisting the Lead when relevant.Basic Qualifications Bachelor’s degree in Computer Science or a related field5+ years in architecting and implementing fully automated (IaC/Terraform), secure, reliable, scalable & resilient hybrid-cloud solutions.Must have hands-on experience with Kubernetes, microservices architecture Experience with DevOps concepts, tools (containers, (CI/CD – Github, Jenkins, Artifactory, Helm), Chef, Ansible, Puppet etc.) and emerging technologies Experience with observability tools such as Splunk, New Relic, Pager DutyExperience with infrastructure systems that support enterprise data science and analytics capabilities, including streaming and real-time analytics (Kafka, Spark Streaming, and Snowplow)Experience with on-prem to cloud migrationExperience with cloud security toolsets (Prisma, Zeronorth, Wiz, JFrog Xray, Cloudwatch etc)Exposure to network infrastructure (Ex. setting up and managing firewalls, WAFs, network segregation, VPNs and network ACLs)Strong written and verbal communication skillsAble to thrive in a collaborative and cross-functional environmentAWS /Azure associate certificationPreferred Qualifications: 7+ Years of experience in AWS/AZURE cloud CKA Certification – Certified Kubernetes Administrator or CKAD – Certified Kubernetes Application DeveloperSubject matter expert in Cloud Security and/or Cloud NetworkingAWS /Azure certification preferably at professional level#LI-TM1Why Join Us.We’ve been around since 1851. During our history, we’ve learned a few things about making sure our customers are our top priority. In order to meet and exceed their expectations, we must have the best people providing the best thinking, products and services. To accomplish this, we celebrate an inclusive, vibrant and diverse culture that encourages growth, openness and opportunities for everyone. A career with MassMutual means you will be part of a strong, stable and ethical business with industry leading pay and benefits. And your voice will always be heard.We help people secure their future and protect the ones they love. As a company owned by our policyowners, we are defined by mutuality and our vision to put customers first. It’s more than our company structure – it’s our way of life. We are a company of people protecting people. Our company exists because people are willing to share risk and resources, and rely on each other when it counts. At MassMutual, we Live Mutual.MassMutual is an Equal Employment Opportunity employer Minority/Female/Sexual Orientation/Gender Identity/Individual with Disability/Protected Veteran. We welcome all persons to apply. Note: Veterans are welcome to apply, regardless of their discharge status.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


The Tech Career Guru
We will be happy to hear your thoughts

Leave a reply

Tech Jobs Here