Fulltime Site Reliability Engineer openings in California on September 13, 2022

Site Reliability Engineer (SRE) (Ref #2219-L) at Toyon Research Corporation

Location: Goleta

Toyon Research Corporation is seeking a full-time experienced Site Reliability Engineer (SRE) passionate about solving problems to work either in our Goleta, CA or Fort Collins, CO office. Our SRE team is relied upon to empower our users and IT teams with rich, automated technologies and solutions, reducing toil, technical debt, and downtime. Specifically, we are searching for someone with a strong background in problem-solving, with a vigorous drive to learn and grow their technical knowledge while improving service reliability across the organization. Responsibilities will include:
• Create automation tooling & code to help reduce toil and minimize error prone manual processes
• Work with monitoring infrastructure to develop automated responses/remediation and address the underlying issues that generate alerts
• Expand observability to improve decision making and reduce time to resolution
• Work in tandem with our service desk & systems administration teams to produce tools that improve effectiveness
• Work towards the continual improvement of systems performance, reliability, and compliance
• With an eye towards improving capabilities, anticipating and solving customer needs, and pushing to improve code & processes.
• Improve processes & documentation around processes
• Create maintain and improve CI/CD pipelines
• Containerize existing workloads
• Incident response handling and root cause analysis


Preferred Skills & Qualifications:
• Proficiency in one or more of the following scripting or programming languages:
• Python, Go, Bash, PowerShell, Java
• Experience with NoSQL and SQL Databases
• Experience building services by leveraging web APIs
• Experience with containers and container orchestration, Docker, Podman, Kubernetes, etc.
• Experience with log management/aggregation platforms Splunk, Elastic Stack, etc.
• Strong familiarity with Linux, MacOS, Windows command-line interfaces
• Familiarity with system automation/configuration management such as Ansible, Puppet, Chef, Salt
• Self-starter with interest in expanding existing knowledge of technical systems and SRE fundamentals
• Excellent critical thinking & problem-solving skills
• Desire to eliminate manual and repetitive tasks through automation
• Desire to continuously learn and improve

U.S. Citizenship is Required. Ability to qualify for a US Department of Defense security clearance required.
Learn more about our company in our latest video, We are Toyon.

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.

Apply Here
For Remote Site Reliability Engineer (SRE) (Ref #2219-L) roles, visit Remote Site Reliability Engineer (SRE) (Ref #2219-L) Roles


Senior Site Reliability Engineer at Got It AI

Location: Burlingame

Got It AI is transforming the human-to-human and human-to-machine conversational ecosystem by developing the world’s first fully autonomous conversational AI system by leveraging state-of-the-art technologies. With an experienced leadership team having a solid track record, Got It AI is delivering the impact across multiple applications and verticals such as education, finance, and e-commerce.

The Core Job
• Build tools and solutions that will drive continuous improvement around monitoring, performance, reliability, availability & scalability of our infrastructure.
• Build and deploy Machine Learning solutions on AWS Cloud infrastructure.
• Responsible for capacity planning, monitoring service KPIs and enforcing SLAs.
• Operate as a primary point of contact for production incident response and perform root cause analysis.
• Optimize for security, performance, and cost.

You Must Have
• 5+ years of experience as an SRE in start-up environments.
• Strong experience deploying and maintaining Kubernetes infrastructure.
• Deep understanding and experience with AWS services (VPC, EKS, RDS, CloudFront, CloudWatch, S3, etc).
• Deep understanding of service metrics through the development of dashboards, service KPIs, monitoring and alarming systems (Prometheus)
• Experience deploying and maintaining CI/CD pipelines (Jenkins, CodePipeline, etc) and Infrastructure as Code frameworks (Terraform, CloudFormation)
• Experience with Linux/UNIX administration, configuration, networking infrastructure and security.
• Skilled in automating tasks with scripting languages such as Python, Bash.
• Experience debugging system, networking and security issues on large-scale distributed systems.
• Experience working in an operational environment with strict SLAs and managed incident response and disaster recovery strategies.
• Excellent Communication skills.
• BS/MS in Computer Science or equivalent.

Extra Bonus For
• Certified Kubernetes Administration
• AWS Certifications

• We offer excellent benefits, competitive compensation and equity
• Medical, dental, and vision insurance
• 401-k
• Paid parental leave
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Sr Site Reliability Engineer at Myriad Genetics

Location: South San Francisco

Design, analyze and troubleshoot distributed systems. Use AWS Cloud to build, configure, automate and maintain highly reliable, scalable and secure services & infrastructure (EC2, IAM, Security Groups, Route53 and S3.) Advise on systems architecture, performance, capacity planning, and monitoring. Install and configure EC2 instances and create CI/CD pipelines, frameworks & methodologies. Use Python to perform automation of Infrastructure as a Service, Identity Services, Container Orchestration, Configuration Management and Monitoring. Lead containerization using Docker & Kubernetes. Diagnose/resolve problems in web apps/network using Linux. Provide Linux system administration. Develop/maintain performant RESTful APIs. Configure/ administer VMware cluster including virtual machine management. Support NetApp storage devices.

Requirements: BS in Computer Science or Electronics Engineering, plus 5 years of progressive software systems engineering experience (including at least 2 years in job offered), and proficiency with Linux system admin, Python, Shell Scripting, Puppet, Docker, Kubernetes, AWS, GoCD, Active Directory, NetApp, Swiftstack object storage, VMware, CI/CD, Github, and NetBackup.Myriad Genetics Inc., is a leading personalized medicine company dedicated to being a trusted advisor transforming patient lives worldwide with pioneering molecular diagnostics. Myriad discovers and commercializes molecular diagnostic tests that: determine the risk of developing disease, accurately diagnose disease, assess the risk of disease progression, and guide treatment decisions across six major medical specialties where molecular diagnostics can significantly improve patient care and lower healthcare costs. Myriad is focused on three strategic imperatives: maintaining leadership in an expanding hereditary cancer market, diversifying its product portfolio through the introduction of new products and increasing the revenue contribution from international markets. For more information on how Myriad is making a difference, please visit the Company’s website: .

WE ARE AN EQUAL OPPORTUNITY EMPLOYER. Applicants and employees are considered for positions and are evaluated without regard to mental or physical disability, race, color, religion, gender,national origin, age, genetic information, military or veteran status, sexual orientation, marital status or any other protected Federal, State/Province or Local status unrelated to the performance of the work involved.

Please answer all questions completely. Please do not provide any information not specifically requested on this Employment Application form.
Apply Here
For Remote Sr Site Reliability Engineer roles, visit Remote Sr Site Reliability Engineer Roles


Site Reliability Engineer- (multiple levels) at Salesforce

Location: San Francisco

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Products and Technology

Job Details

Job Description

Site Reliability Engineer – All Levels – (Senior/Lead/Principal) (Multiple Locations)

Job Details

Note: By applying to the Site Reliability Engineer posting, recruiters and hiring managers across the organization hiring Site Reliability Engineers will review your resume. Our goal is for you to apply once and have your resume reviewed by multiple hiring teams.

Are you an upcoming or recent graduate (within the past 2.5 years)? Please check out our FutureForce program at www.salesforce.com/futureforce. We appreciate your interest but we are seeking industry experienced engineers.

Salesforce, the Customer Success Platform and world’s #1 CRM, empowers companies to connect with their customers in a whole new way. The company was founded on three disruptive ideas: a new technology model in cloud computing, a pay-as-you-go business model, and a new integrated corporate philanthropy model. These founding principles have taken our company to great heights, including being named one of Forbes’s “World’s Most Innovative Company” five years in a row and one of Fortune’s “100 Best Companies to Work For” eight years in a row. We are the fastest growing of the top 10 enterprise software companies, and this level of growth equals incredible opportunities to grow a career at Salesforce. Together, with our whole Ohana (Hawaiian for “family”) made up of our employees, customers, partners and communities, we are working to improve the state of the world.

About Salesforce Tech And Product Engineering

Our Tech and Product team is tasked with innovating and maintaining a massive distributed systems engineering platform that ships hundreds of features to production for tens of millions of users across all industries every day. Our users count on our platform to be highly reliable, lightning fast, supremely secure, and to preserve all of their customizations and integrations every time we ship. Our platform is deeply customizable to meet the differing demands of our vast user base, creating an exciting environment filled with complex challenges for our hundreds of agile engineering teams every day.

Check out our “We are Salesforce Engineering” video

We are Salesforce Engineering

Departmental Description

Salesforce is seeking an engineering candidate to join the Site Reliability organization in one of our US locations. Working closely with counterparts in the Infrastructure and R&D organizations, this organization provides a global team of engineers monitoring cloud service availability and ready to swiftly repair any service-impacting issues. Seven days a week, 24 hours a day, in a follow-the-sun model, the Site Reliability team keeps the Salesforce cloud and our customers protected. As a member of the Site Reliability team, you will be tasked with detecting and resolving incidents within minutes. This objective is met by monitoring the services, reacting to problems, and proactively addressing issues before they affect performance or availability.

Position Description

When not fighting fires, the team is responsible for fire prevention through monitoring, automation, self-healing and resiliency initiatives, destructive testing, and game day exercises. The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.
• Keep the customer-facing services available at top performance by maintaining the constant health of the supporting systems.
• Incident management – Act in key response roles during major incidents e.g. Sev0, Sev1. Also, participate in the technical review of the incident for problem management
• Problem Management – populate in participate in (Root Cause Analyses (RCAs) and hand them off to the Global Solutions team
• Ensuring that work carried out by the Site Reliability team is performed in such a way as to align with the company’s internal compliance policy and directives
• Being available to discuss and resolve technical issues and escalations with other technical staff as the need arises
• Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth
• Identifying work opportunities and preparing or assisting with the preparation of technical proposals as the need arises
• Ability to operate in the fast paced environment and troubleshoot complex issues quickly successfully handle multiple priorities
• Work to automate detection and resolution of recurring issues in the production environment

Basic Requirements
• Bachelors Degree in Computer Science or related field OR equivalent experience
• Systems engineering experience in enterprise scale internet service engineering or related role
• Expertise in TCP/IP related technologies (networking protocols, network programming, etc.)
• Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) as well as strong Linux/UNIX knowledge with significant exposure to Red Hat Enterprise Linux and Solaris
• Experience with monitoring implementations and administration
• Strong communication skills (Written and Oral)
• Past experience in Incident Management and ITIL service operations
• Experience in working in a 24/7 team managing large data centers
• Participate in the team’s on-call rotation to address complex problems in real-time and keep services operational and highly available

Preferred Qualifications
• Masters in Computer Science
• Perl/Python/BASH scripting experience
• Prior Chef/Puppet or automated deployment experience
• Experience in maintaining a monitoring and alert systems
• Experience troubleshooting relational databases and distributed platforms
• Experience in maintaining Java applications
• Experience in Docker orchestration and management.
• Experience with Kubernetes
• Hands on experience configuring and managing AWS (Amazon Web Services), using the CLI/SDKs
• Experience handling systems monitoring and alerts.
• Experience with JVM optimization and Java server technologies like Tomcat or Jetty

Benefits & Perks

We have a public-facing website salesforcebenefits.com that explains our various benefits, including wellbeing reimbursement, generous parental leave, adoption assistance, fertility benefits, and more. Visit for the full breakdown!

Open to Fully Remote, Flex (1-3 days/week in the office), or Office-Based (4-5 days/week in office)


If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

At Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at Salesforce and explore our benefits.

Salesforce.com and Salesforce.org are Equal Employment Opportunity and Affirmative Action Employers. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Salesforce.com and Salesforce.org do not accept unsolicited headhunter and agency resumes. Salesforce.com and Salesforce.org will not pay any third-party agency or company that does not have a signed agreement with Salesforce.com or Salesforce.org.

Salesforce welcomes all.

Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.
Apply Here
For Remote Site Reliability Engineer- (multiple levels) roles, visit Remote Site Reliability Engineer- (multiple levels) Roles


Site Reliability Engineer-Splunk at Tesla, Inc.

Location: Fremont

Job Category Engineering & Information Technology Location Fremont, California Req. ID 148009 Job Type Full-time Apply…Engineer, Reliability, Automotive
Apply Here
For Remote Site Reliability Engineer-Splunk roles, visit Remote Site Reliability Engineer-Splunk Roles


Senior Site Reliability Engineer at SonicJobs

Location: San Jose

Growing 5G company in Milpitas, CA is looking for a Senior Site Reliability Engineer to join the team!

This Jobot Job is hosted by: Taylor Buckelew
Are you a fit? Easy Apply now by clicking the “Apply Now” button and sending us your resume.
Salary: $155,000 – $185,000 per year

A bit about us:

Located in the beautiful Milpitas, CA we are the first 5G wireless company that is eliminating hardware and providing true NLOS performance .

We are looking to hire a Senior Site Reliability Engineer, who will help us manage software that runs on the cloud and remotely manages millions of radio devices.

Why join us?

We offer a comprehensive compensation package including but not limited to:
• A competitive base salary ranging from $155K-$185K (DOE) + EQUITY!
• Full Benefits (Medical, Dental, Vision, Life)
• 401K + match!
• Generous Paid Time Off/Holiday/Sick time
• Mentorship / Growth Opportunities
• Friendly/collaborative team
Job Details

You will be responsible for and must have ALL or MOST of the following:

Job Responsibilities:
• Set up and manage Kubernetes and Istio-based clusters for deploying applications
• Set up and manage other services needed by the application (e.g. AWS Kafka, ElasticSearch, Redis, RDS databases, etc.)
• Manage development, test, staging, and production AWS environments
Required Skills and Experience:
• Strong Linux administration experience
• At least 3 years hands-on experience in AWS
• Strong troubleshooting abilities; you should be a hacker
• Monitoring and Alerting using Prometheus & Grafana
• Experience with AWS, Linux, Terraform, troubleshooting, Kubernetes and Microservices
IF INTERESTED, APPLY DIRECTLY HERE OR EMAIL ME A RESUME AT https://apply.jobot.com/jobs/senior-site-reliability-engineer/179086744/?utm_source=CareerBuilder
Interested in hearing more? Easy Apply now by clicking the “Apply Now” button.
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Sr. Site Reliability Engineer at Zscaler

Location: San Jose

Company Description

Zscaler (NASDAQ: ZS) accelerates digital transformation so that customers can be more agile, efficient, resilient, and secure. The Zscaler Zero Trust Exchange is the company’s cloud-native platform that protects thousands of customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location.

With more than 10 years of experience developing, operating, and scaling the cloud, Zscaler serves thousands of enterprise customers around the world, including 450 of the Forbes Global 2000 organizations. In addition to protecting customers from damaging threats, such as ransomware and data exfiltration, it helps them slash costs, reduce complexity, and improve the user experience by eliminating stacks of latency-creating gateway appliances.

Zscaler was founded in 2007 with a mission to make the cloud a safe place to do business and a more enjoyable experience for enterprise users. Zscaler’s purpose-built security platform puts a company’s defenses and controls where the connections occur—the internet—so that every connection is fast and secure, no matter how or where users connect or where their applications and workloads reside.

Job Description

The Sr. SRE will be primarily responsible to develop and integrate automation scripts to our CI-CD pipeline and handle OPS responsibilities of engineering public cloud infrastructure. Build and create updated Linux and FreeBSD tools packages and qualify this for field deployment. The Engineering team develops innovative solutions that are transforming the internet security business, and millions of users rely on our service for data protection and comprehensive security. We also provide real-time analytics to our customers for unmatched visibility and maintain a state-of-the-art NOC. The development team works on web filtering, policy enforcement, next-gen firewall, sandboxing, DLP, and mobile user solutions. Being responsive to our customers and delivering industry-leading mission-critical solutions require precise engineering and a philosophy of continuous improvement, both areas in which Zscaler engineers excel.

Responsibilities/What You’ll Do
• You will design, build and operate reliable and secure Cloud infrastructure.
• You will orchestrate end-to-end monitoring and alerting.
• You will configure Cloud resources using Infrastructure as Code frameworks like Terraform, CloudFormation and Ansible.
• You will work in private or public cloud such as AWS, Azure or GCP.
• You will develop tools and test suites that will be integrated into the engineering CI-CD pipeline running on Bamboo and Jenkins Infrastructure.
• You will automate Infrastructure deployment and updates in OpenStack, AWS, Azure and ESXi environments.
• You will Automate and maintain upgrades of Cloud applications through Bamboo and Jenkins tools.
• You will analyze Linux and FreeBSD tools and packages and compile/build new packages for production deployment.
• You will configure and monitor engineering cloud infrastructure events and alerts. Take necessary actions to address the Alerts.
• You will create bugs from the issues/alerts observed in the operation of Zscaler Development and Preview cloud and coordinate with the development team to resolve the bugs.
• You will manage and resolve tickets to provide Developers requested Infrastructure in OpenStack and VMWARE ESXi.

• Strong Linux administration, internals, and network troubleshooting – Minimum 5 years.
• Proficiency with programming languages like Python and/or Golang. Shell scripting and understanding of C, C++ code desired.
• Background in Linux/FreeBSD systems.
• Strong experience with public clouds. (Preferably AWS and/or Azure.)
• Work with technologies like Ansible, Terraform, and CI/CD platforms.
• Proficiency with monitoring tools such as Grafana or Prometheus.
• Ensure reliability, scalability, and performance for all our deployments and environments, both internally and externally.
• Automate and create workloads in Bamboo and Jenkins that enable all engineers to develop high-quality, production code quickly.
• Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive.

Additional Information

All your information will be kept confidential according to EEO guidelines.

What You Can Expect From Us
• An environment where you will be working on cutting edge technologies and architectures
• A fun, passionate and collaborative workplace
• Competitive salary and benefits, including equity

Why Zscaler?

People who excel at Zscaler are smart, motivated and share our values. Ask yourself: Do you want to team with the best talent in the industry? Do you want to work on disruptive technology? Do you thrive in a fluid work environment? Do you appreciate a company culture that enables individual and group success and celebrates achievement? If you said yes, we’d love to talk to you about joining our award-winning team.

Additional information about Zscaler (NASDAQ: ZS ) is available at https://www.zscaler.com.

Zscaler is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Apply Here
For Remote Sr. Site Reliability Engineer roles, visit Remote Sr. Site Reliability Engineer Roles


Site Reliability Engineer at Blue Shield of California

Location: Los Angeles

Blue Shield of California’s mission is to ensure all Californians have access to high-quality health care at a sustainably affordable price. We are transforming health care in a way that truly serves our nonprofit mission by lowering costs, improving quality, and enhancing the member and physician experience.

To fulfill our mission, we must ensure a diverse, equitable, and inclusive environment where all employees can be their authentic selves and fully contribute to meet the needs of the multifaceted communities we serve. Our comprehensive approach to diversity, equity, and inclusion combines a focus on our people, processes, and systems with a deep commitment to promoting social justice and health equity through our products, business practices, and presence as a corporate citizen.

Blue Shield has received awards and recognition for being a certified Great Place to Work, best place to work for LGBTQ equality, leading disability employer, one of the best companies for women to advance, Bay Area’s top companies in volunteering & giving, and one of the world’s most ethical companies. Here ld of California, we are striving to make a positive change across our industry and the communities we live in – join us!

Your Role
The Infrastructure and Cloud Services team provides support to design, develop, and improve services, platforms and processes that result in improved end-to-end reliability and maintainability to our mission critical application services. The Senior Reliability Engineer will report to the Senior Manager, Technical Engineering. As stewards of the four golden signals, you will proactively seek out system weaknesses and remediate discovered issues before production issues occur using observability principles, trend analysis, and test resiliency using Chaos Engineering.

Your Work
In this role, you will:
Design, build and support the application stack in an operationally reliable and cost-effective manner
Maintain and measure reliability, latency, and scalability for complex systems
Automate and orchestrate workflows through tools, scripting, and programming
Troubleshoot, manage, and resolve issues in production environments and collaborate with IT and business teams to implement strategies to eliminate them
Deliver and support monitoring of business service health including Service Level Agreements
Perform proactive daily system monitoring including reviewing system and application logs as well as responding to, triaging, troubleshooting and remediating incidents.
Develop automation and processes to enable teams to deploy, manage, configure, scale, and monitor their applications
Experience and understanding of deployment strategies:
Basic, Blue/Green, Canary, multi-service, Rolling, and A/B Testing
Using Agile Methodologies for CI/CD
Collaboration with infrastructure and application development teams to incorporate SRE strategies
Develop CI/CD processes to improve cadence.
Using Chaos Engineering to test what you build under real-world conditions.
Running monthly Chaos Engineering “Game Days”.
Drive architectural consolidation and simplification.
Work closely with internal partners and teams to deliver high quality solutions from ideas to production code. Debugging complex problems across an entire stack and creating solid solutions.
Post incident-reviews to find out what’s working and what’s not and improving them by filling the gaps in the process.
Create and review documentation and process regarding recurring issues, new standard operating procedures, knowledge transfer material, etc.
Design and build an SRE function that owns application availability, performance and managing it through automation and proactive/predictive alerts using data analytical toolsets to identify areas of improvement for Dev and Ops teams.
Implement comprehensive service monitoring to ensure uptime and performance, including synthetic, real user traffic, application performance, system level and dashboards
Define, measure, and meet SLA/SLOs focusing on availability, performance, incidents, and chronic quality issues. Arm developers with deeper insights into application performance and service health issues towards reducing MTTA & MTTR
Self-Healing for monitoring abrasion – use of desired state within tools such as Ansible Tower or other orchestration tools

Your Knowledge and Experience
Requires a bachelor’s degree in computer science or equivalent field
Requires at least 5 years of prior relevant experience
5-7 years of experience with software engineering, software development, or production operations experience in a large-scale environment.
2+ years using scripting languages such as Bash, Python, and Power Shell and/or others
Experience designing, building, and operating large-scale production SaaS and PaaS platforms

Experience with monitoring and observability such as with Datadog and Prometheus, AppD, Dynatrace
Production experience with Dev Ops ability engineering running applications
Experience operating in full Agile CI/CD Dev Ops pipeline…
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


SRE (Site Reliability Engineering) at SonicJobs

Location: Mountain View

SRE Role
Remote role

Job Description:
• We are looking for a Lead/Senior SRE Engineer who has expert understanding of ServiceNow, to support build out of a strategic operational monitoring framework.
• The candidate should have history of working on ServiceNow integrations and can perform in-depth analysis of performance issues within ServiceNow.
• The goal of this role is for ServiceNow Subject Matter Expert for identifying and proposing ServiceNow Solutions for business development, including helping plan, implement and administer all aspects of the new ServiceNow IT Service Management (ITSM) platform l.
• The role will work with Systems Support, Quality Engineering, change and Release Management, product owners, scrum masters, and with other technology centers of excellence.

SRE Role
Remote role

Job Description:
• We are looking for a Lead/Senior SRE Engineer who has expert understanding of ServiceNow, to support build out of a strategic operational monitoring framework.
• The candidate should have history of working on ServiceNow integrations and can perform in-depth analysis of performance issues within ServiceNow.
• The goal of this role is for ServiceNow Subject Matter Expert for identifying and proposing ServiceNow Solutions for business development, including helping plan, implement and administer all aspects of the new ServiceNow IT Service Management (ITSM) platform l.
• The role will work with Systems Support, Quality Engineering, change and Release Management, product owners, scrum masters, and with other technology centers of excellence.
Apply Here
For Remote SRE (Site Reliability Engineering) roles, visit Remote SRE (Site Reliability Engineering) Roles


Site Reliability Engineer-TikTok at TikTok

Location: Mountain View


About TikTok
TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Mumbai, Singapore, Jakarta, Seoul and Tokyo.

About the Team
Join us on the US Tech Service of a fast-growing team at TikTok! We are hiring for experienced Site Reliability Engineers (SREs) to build the next generation of highly reliable, largely scaled and massively distributed infrastructures. The position is available in our Mountain View, CA, Seattle, WA; Culver City, CA as well as New York, NY offices.

What You’ll Do
– Engage in and improve the whole lifecycle of services from inception and design, throughout development, capacity planning, and launch reviews, to deployment, operation, and refinement
– Design and implement software platforms and monitor frameworks for efficient, automated, and intelligent service-oriented architecture (SOA) governance
– Scale systems sustainably through mechanisms such as automation; evolve systems reliability, efficiency, and velocity by pushing for changes
– Practice sustainable user support, incident response, and blameless postmortems.


Who We’re Looking For
– Bachelor’s degree majoring in Computer Science, or related fields, with at least 2 years of related work experience
– Experience in SRE of large-scale systems deployment with high reliability and scalability
– Familiar with system operation skills in Linux and network
– Experience programming in at least one of the following languages: Python, Perl, Go, or C/C++
– Experience in designing, analyzing and troubleshooting large-scale distributed systems
– Familiar with popular CI/CD procedures and environments
– Effective communication skills and a sense of ownership and drive

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We believe individuals shouldn’t be disadvantaged because of their background or identity, but instead should be considered based on their strengths and experience. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations during our recruitment process. If you need assistance or an accommodation, please reach out to us at usrc@tiktok.com.
Apply Here
For Remote Site Reliability Engineer-TikTok roles, visit Remote Site Reliability Engineer-TikTok Roles


The Tech Career Guru
We will be happy to hear your thoughts

Leave a reply

Tech Jobs Here