Fulltime Site Reliability Engineer openings in Austin, United States on September 22, 2022

Site Reliability Engineer at Tata Consultancy Services

Location: Austin


Hope you are doing great!!!I belong to Talent Acquisition Team of TATA Consultancy Services (TCS) one of the world’s leading information technology companies for the North America geography. I have come across your profile through portal and would like to discuss with you regarding opportunities. Please have a look at the job description and let me know your interest

Job Description:

Position – AWS (Site reliability engineering ) SRE

Role:- Fulltime

Location – Sunnyvale, CA / Austin, TX

Job Description:

Strong sense of ownership, customer service and integrity demonstrated through clear communication.

Deep understanding of the Linux and system Administration at large scale.

Coding experience Using a high level programming Language like . python or Golang
• Operate, monitor, and maintain high availability of software service for SaaSOptics product running in a multi-region AWS cloud environment. Continue to automate, scale, and manage our AWS cloud infrastructure. Work with team to establish service level objectives and monitor to ensure the objectives are met
• “All offers of employment extended to applicants will be conditional and will require, among other things, that the recipient of the offer of employment submit proof that s/he is fully vaccinated at that time or will submit such proof prior to determining a start date . Individuals with medical issues or sincere religious beliefs that prevent them from getting the vaccine may request an exemption from the vaccine requirement. To the extent State legislation or executive action purports to limit TCS’s ability to require vaccination for individuals who object on a basis other than medical issues or sincere religious beliefs, individual requests for an exception to TCS’s generally applicable vaccination policy pursuant to relevant State regulation will be evaluated on a case-by-case basis. Individuals who receive an exemption from vaccination for any reason may be required to comply with other Covid precautions.“
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Site Reliability Engineer at Diligent Robotics

Location: Austin

What we’re doing isn’t easy. But nothing worth doing ever is.

We envision a future powered by robots that work seamlessly with human teams. We build the artificial intelligence that enables service robots to collaborate with people and adapt to dynamic human environments. Join our mission-driven, venture-backed team as we build out our customer-facing operations arm.

The Diligent Robotics Site Reliability Engineer (SRE) works together with engineering teams, IT, and Security to address unique business challenges through comprehensive solutions while taking into account system uptime, reliability, and maintainability. Instrument and monitor the breadth of our full platform stack (hosts, applications, and performance). In this role you will work closely with our engineering and information security teams to enhance the automated system provisioning and deployment subsystems within codified infrastructure. You will work with developers to create more robust and scalable services independent of cloud implementations. You will help to isolate, trap, and respond from the inevitability of system failure and develop strategies for continuous monitoring and analysis to reduce both downtime and required manual intervention. You will participate in On-Call rotation to maintain platform SLAs.

• Analyze our current operational toolset for shortcomings and product improvements; provide and implement recommendations.
• Creating, configuring and maintaining cloud-based infrastructure and services for the rapid development and monitoring of complex robotics and analytics applications.
• Build tools to automate monitoring and management of robot fleets.
• Triage issues as they arise, both on robots and in deployed software.
• Automate common operations to allow Diligent’s robotic fleet to scale exponentially.
• Being an active member of the software engineering team, helping to improve the organization’s SDLC process and minimizing time from code-complete to production.
• Mentor engineers in SRE best practices and modern software engineering
• Occasional off-hours, on-call work required.

• 5+ years of combined experience in SRE/DevOps or Software Engineering roles in a full stack engineering environment
• Bachelor’s degree in Computer Science, related field, or equivalent experience
• Management of hosting environment, including database administration and scaling an application to support load changes
• Experience soliciting systems requirements, designing, and implementing new platform components leveraging infrastructure or SaaS services.
• Experience working with distributed, fault tolerant systems
• Experience with running a production environment in one or more Infrastructure as a Service cloud providers (AWS or Google Cloud)
• Experience with modern datastores at small to medium scale (Firestore, Redshift, Postgres, Mongo, distributed queues like Kafka, MosquittoMQ).
• Experience with converting monolithic applications to microservices and service discovery technology
• Experience automating system provisioning, configuration, and Infrastructure as Code (Cloudformation, Terraform, Ansible, etc)
• Solid Linux skills and proficiency in at least one high-level language (i.e. Python).
• Experience working in an agile methodology development lifecycle

Nice to Haves:
• Exposure to systems security requirements, information assurance techniques, and system hardening

• Competitive salary and equity based on experience and contribution
• Opportunity to be part of an exciting startup venture
• Experience working with some of the leading experts in robotics
• Potential to radically change the future of healthcare

COVID-19 Precaution(s):
• Remote-first interview process
• Social distancing & mask guidelines in place for onsite meetings and work (eg. demos, hardware, in hospitals, etc.)

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Associate Site Reliability Engineer at Liferay, Inc.

Location: Austin

About Liferay Cloud

Liferay Cloud has grown from a handful of people into an established group present in all regions of the globe. The Liferay Cloud platform evolved from a simple concept into an enterprise-ready solution. Our customer base now serves hundreds of thousands of users worldwide. The Cloud Customer Care Group is responsible for providing the best experience to our customers as well as offering valuable technical insights that will increase customers’ projects. Oh, we’re also self-funded which gives us the freedom to work on whatever we think brings the most value to customers and communities in the long run.

About You and this Role

As a Site Reliability Engineer you will be part of a team tasked with setting up and evolving our troubleshooting, observability and monitoring capabilities in order to provide excellent services to our customers. You will also have responsibilities on incident management and problem management disciplines as well manage and deliver technical debt to our platform. Working alongside peers around the world, regional best practices will be shared and implemented. Your ultimate goal is to provide fail proof services, while improving our platform and mentoring other team members along the way. If you thrive as a code sleuth in a fast-paced environment and enjoy the challenge of always facing new problems, we have the right spot for you. Our team is distributed globally and you have the unique opportunity of learning and connecting with peers in different regions across the globe. If this sounds like a challenge you are up for, apply today!

Key Objectives
• Troubleshoot platform issues reported by customers
• Develop technical solutions and workarounds to ensure successful client projects
• Participate in projects to enhance the quality of our platform
• Ensure technical solutions are delivered with the highest qualitySuggest improvements and have a humble, everyone-can-learn-from-everyone attitude

Required Qualifications
• Bachelor’s degree in Computer Science/Engineering or equivalent.
• 1+ years of experience on a software engineering team supporting an active product
• Know at least 1 object-oriented language
• Basic level experience with Linux-based operating systems
• Preferred Qualifications
• Experience working remotely and/or with a distributed team
• Basic knowledge with Unix and Linux-based operating systems
• Be familiar with following programming languages:
• Bash, Javascript (NodeJS and React), Java and Go
• Be familiar with the JIRA and Github
• Basic knowledge of GCP or Azure Cloud technologies and OWASP Top 10
• Be familiar with these technologies: Docker, Kubernetes,
• Any of the following certifications is a plus:
• Google Cloud Certified Professional DevOps Engineer
• Google Cloud Certified Professional Cloud Security Engineer
• Google Cloud Certified Professional Cloud Network Engineer

What We Offer
• Salary package according to qualifications and experience
• Competitive benefits
• A positive and collaborative work culture – we like to Stay Nerdy!
• Working at a leading open source company with strong growth opportunities

Equal Opportunities Employer – Statement

Liferay is committed to the equal treatment of all candidates, customers and employees and to fostering a culture of dignity at work. Our operating procedure provides for equal opportunities in recruitment and employment with the aim to eliminate discrimination against any job applicant or employee on the basis of race, age, sexual orientation, gender or gender reassignment, religion or beliefs, marital or civil partnerships status, family or dependency status, disability, pregnancy and maternity or membership of a traveling community.
Apply Here
For Remote Associate Site Reliability Engineer roles, visit Remote Associate Site Reliability Engineer Roles


Site Reliability Engineer at VetCentric

Location: Austin

About Us:
VetCentric, a joint venture between Gen3 Technology and PingWind is focused on delivering outstanding services to the federal government.  We have extensive experience in the fields of cyber security, supply chain & logistics management, strategy, business analytics, and IT services such as system design, continuous improvement, virtualization, and data center management.  VetCentric is an SBA certified HUBZone company and VA CVE certified Service-Disabled Veteran Owned Small Business (SDVOSB). We operate in 15 states with offices in Washington DC and Northern Virginia. ​
About The Role:
As a Site Reliability Engineer on our team supporting VA’s Event Management, you have the chance to use your hardware and software skills to improve the monitoring and triage processes supporting the VA. You’ll work in the Enterprise Command Center with the incident management, problem management, and DevOps teams to detect, investigate, and diagnose system problems and defects across Enterprise level applications and technology stacks and evaluate and modernize VA Enterprise systems You will leverage comprehension of workflow systems and applications processes within multiple system environments. As a technical SME, you will work with system and network administrators to troubleshoot performance issues and outages. This is your chance to develop your skills in enterprise-level triage and incident resolution while gaining experience in VA system infrastructure This work may include shift support during weekends, holidays, or off-hours, as required.
Location: Austin, TX, on-site at the Austin Information Technology Center
What You’ll Need:
+ 5+ years of experience with systems administration and operations, including maintaining responsibility for overseeing operational performance, and performing performance trend analysis and alert generation for failed or failing services in a professional IT work environment
+ 5+ years of experience deploying, maintaining, and troubleshooting complex applications at an enterprise scale while working with cross-functional teams
+ 5+ years of monitoring and troubleshooting experience with two or more of the following APM tools, AppDynamics, DynaTrace, Splunk, Aternity, or SolarWinds
+ 1+ years of experience in service virtualization, AWS or Azure Cloud technologies, and SaaS and PaaS implementation
+ Experience with using Microsoft Office, including Word, Excel, and PowerPoint
Ability to obtain a security clearance
+ Bachelor’s degree in Computer Science or Engineering and 5 years of experience in a professional work environment or 13 years of experience in a professional work environment in lieu of a degree.
Employment eligibility: Eligible to work for any employer in the United States without requiring sponsorship. Sponsorship is not currently available.
Employment status: W2
Perks working with us:
+ Competitive compensation
+ Comprehensive health, vision, and dental benefits
+ 3 weeks PTO per year accruing from day one
+ 11 days of paid Federal Holidays
+ 401(k) with a matching plan
E-Verify Employer. EOE Females/Minorities/Protected Veterans/Individuals with Disabilities. Gen3 is committed to fostering and empowering an inclusive community within our company. We do not discriminate on the basis of race, religion, color, gender expression or identity, sexual orientation, national origin, citizenship, age, marital status, veteran status, disability status, or any other characteristic protected by law.
Company Perks and Benefits
+ Competitive compensation
+ Comprehensive health, vision, and dental benefits
+ 15 days of leave and 11 days of paid Federal Holidays  
+ 401(k) with a matching plan
+ Annual training budget
+ Fantastic company culture
“E-Verify Employer, EOE Females/Minorities/Protected Veterans/Individuals with Disabilities; VetCentric partners will offer equal employment opportunities to all persons without regard to race, color, religion, sexual orientation, gender, gender identity, age, national origin, physical or mental disability, veteran status, or other characteristic protected by applicable law.”
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Senior Site Reliability Engineer at Jobot

Location: Austin

Site Reliability Engineer (Remote) Needed – Leading Education SaaS Platform!

This Jobot Job is hosted by: Stanton Sikorski
Are you a fit? Easy Apply now by clicking the “Apply Now” button and sending us your resume.
Salary: $120,000 – $190,000 per year

A bit about us:

Founded over twenty years ago, we specialize in building a SaaS education platform and are a leading curriculum solutions provider for K-12 students. Our comprehensive, dynamic, and progressive learning technology helps students develop as learners and thinkers. Our platform delivers research-proven, high-quality core and supplemental solutions in math, world languages, ELA and literacy, computer science and biotech, as well as best-in-class K-12 professional learning services. Here, we strive to create an environment where people want to work – one where the larger team comes first, where trying new things (and sometimes failing) is encouraged, and where we pursue our mission relentlessly. We are a major disruptive force in the digital curriculum market by combining world-class research, differentiated technology, best in class content together with a world-class mission-oriented team. Are you passionate about shaping the future of learning?

Why join us?
• Competitive base salary and overall compensation package
• 401 K with generous company match
• Full benefits: Medical, Dental, Vision, Life, Disability
• Generous PTO, vacation, sick, and holiday schedule

Job Details

We are looking for a Senior Site Reliability Engineer to join our growing team to ensure that the systems that our students and teacher rely on daily are available, reliable, secure, scalable, and satisfying. We apply engineering disciplines to improve user satisfaction and prevent crises, while also responding to the inevitable error. The existing team has a broad base of collective experience, so we can accommodate a range of experience levels for this position. We seek to automate the mundane tasks, so we can research and implement exciting tools and technologies.

This role will entail the following:
• Enhancement and improvements of our SaaS solutions within AWS
• Define, operate, and refine processes for continuous integration and deployment of application software
• Development of CI/CD pipelines, IaaS for cloud native applications
• Configuring services
• Manage and interpret application data and logs to assist customer support teams with escalations to development.
• Design and implement mechanisms for proactive monitoring, alerting, trend-analysis and self-healing.
• Identify opportunities to improve DevOps processes and collaborate with the team for solutions.
• Help define, measure and report on SLIs and SLOs, drive organization to meet SLOs, and support the ability of the company to provide its customers with SLAs.
• Participate in post-incident reviews to better expose system or process gaps.
• Document procedures and site infrastructure.

You should know some of the following:
• Site Reliability, SRE
• AWS, SaaS
• IaaC, CloudFormation (currently used), Chef/OpsWorks, Elastic Beanstalk
• CI/CD, Pipelines, Jenkins, Bamboo, Git, Jira
• Container Orchestration, Fargate, ECS, Kubernetes
• Python, Bash
• System logs/metrics, Splunk
• Application Performance Management Tools, New Relic, Datadog

Interested in hearing more? Easy Apply now by clicking the “Apply Now” button.
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Senior Emmi Site Reliability Engineer at Wolters Kluwer

Location: Austin

Job ID:- R0029643
Basic Function:
We are looking for a Senior Devops Site Reliability Engineer to join our SRE team at Emmi Solutions and help build, monitor and support our applications and infrastructure spanning on premise and in cloud environments. This is an opportunity to use and expand your knowledge of Devops processes and practices, and of systems and operations, as we continue to extend our work to Cloud Platforms. This position is part of an established team, and will have opportunities to employ a wide range of technologies in cloud computing, monitoring and observability, containerization, configuration/infrastructure as code, and automation.
Essential Duties and Responsibilities:
+ Partner with Engineering, Security, and IT to build, deploy, maintain, support and monitor a complex microservice-based application
+ The role will work on a mixture cloud infrastructure technologies, and on premise (hosted) systems.
+ Design and implement devops pipelines and cloud resource deployment utilizing Infrastructure as Code and Configuration Management approaches.
+ Implement and collaborate on solutions that increase the monitoring and observability of systems at scale and detect and alert on trends of information.
+ Define metrics to ensure the high performance and stability of our development and production environments.
+ Enable and implement continuous delivery and continuous integration using Jenkins pipelines
+ Analyze a variety of approaches to SRE / DevOps problems – provide pros and cons of different approaches to the team to arrive at an agreed upon direction.
+ Provide first level support for application software issues in all environments.
+ Prioritize and rapidly troubleshoot issues to ensure maximum uptime and optimal performance for application teams in our production environment.
+ Bachelor’s of Computer Science or equivalent.
+ Minimum 5 years of software related experience required (Site Reliability, DevOps, Release Eng)
+ Cloud experience and architectural understanding of Azure, Docker, and Kubernetes
+ Experience using Infrastructure as Code tools Terraform, Azure RM.
+ Experience with Ansible, Chef, Puppet or other Configuration Management tools with preference to Ansible.
+ Extensive experience managing Linux, Windows VMs.
+ Experience with build and deployment systems including Jenkins, Git, BitBucket.
+ Solid understanding of networking, VNets, private endpoint management, firewall and azure control plane exposure.
+ Solid understanding of Cloud Authentication, Authorization, and Secrets Management.
+ Azure RBAC/POSIX-ACL – built-in and custom Roles, Security Groups and Scopes.
+ Ability to define and manage projects using a variety of tools and methods. Agile Scrum and Kanban, Jira and Confluence a plus.
+ Experience and deep commitment to promoting a DevOps culture focusing on continuous integration, automated deployment, repeatable deployment and DR.
+ Demonstrate strong problem analysis, problem resolution, and decision making and judgment skills.
+ Demonstrate excellent and effective interpersonal and communication skills (written, verbal and listening), with ability to build positive working relationships with all levels of the organization.
+ Demonstrate ability to plan and excel in a fast-paced and demanding environment.
EQUAL EMPLOYMENT OPPORTUNITY Wolters Kluwer U. S. Corporation and all of its subsidiaries, divisions and customer/business units is an Equal Opportunity / Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or protected veteran status.
Apply Here
For Remote Senior Emmi Site Reliability Engineer roles, visit Remote Senior Emmi Site Reliability Engineer Roles


(Cloud) Site Reliability Engineer at Jobot

Location: Austin

High career growth potential. 100% Remote Work Flexibility. Paid 100% employee premiums for healthcare and dental.

This Jobot Job is hosted by: James Boyd
Are you a fit? Easy Apply now by clicking the “Apply Now” button and sending us your resume.
Salary: $120,000 – $150,000 per year

A bit about us:

The world’s leading data connectivity platform. They create plug and play components for instant data connectivity, fully integrated and automated. 200+ SaaS/Cloud, Database, and App connectors and 10+ drivers and adapters in their product suite. Simplify data connectivity, eliminate data silos, and break down barriers to better integration and insights. Our best-in-class data connectivity delivers seamlessly real-time integration of all of your data across your entire tech stack
• ~150 employees (growing fast)
• Privately held
• Founded in 2014

Why join us?

Selling Points:
• High career growth potential. They are looking to hire 80 people a year for the next 3 years. The developers hired today will be in a position of leadership and seniority tomorrow (6-12 months from now).
• 100% Remote Work Flexibility (unable to hire in California and Colorado)
• We pay 100% employee premiums for healthcare and dental
• Generous 4 weeks of PTO
• 100% match in your 401k up to 6% of your salary
• Work/Life balance so our engineers do not get burned out
• Year after year our team is growing so you’ll have career growth opportunities here
• We are a profitable company with sustained growth and stability – make yourself at home here

Job Details

The Site Reliability Engineer is a critical role that will have a significant impact on our cloud platform. We’re looking for someone who is excited about taking ownership of improving the existing infrastructure, designing the future of our cloud platform and working with a diverse team. Attention to detail and eagerness to learn new technologies and systems is critical to the success of this role.

Job Duties:
• Define and help implement infrastructure improvements for our cloud platform
• Support & contribute improvements to the availability, scalability, latency, and efficiency of our cloud platform
• Define and measure production availability, navigating known downtime, and service level outages
• Increase product delivery velocity
• Debug problems at scale for our mission critical services and help our development teams implement lasting fixes to recurring issues
• Execute, debug, and configure CI/CD pipelines
• Analyze service requests and take appropriate action meeting defined SLA
• Define and implement monitoring metrics and alerts to ensure tools and environments are meeting SLA’s for uptime and performance
• Advancing Infrastructure-as-Code and GitOps for the Cloud product team

Ideal Background:
• 2+ years of experience working with public cloud infrastructure (Azure preferred)
• Terraform infrastructure as code experience
• Experience with Kubernetes both as a developer and from an operations perspective
• Deep understanding of Linux and containerization
• Experience deploying and operating applications, Java or C# preferred
• Experience with GitOps based workflows
• Database experience (SQL Server preferred)
• Experience with development practices and tools (JIRA, Git, Azure DevOps)
• Experience with messaging systems and APIs
• Working knowledge of networking (e.g., firewall, routing, network topologies, etc)
• B.S. degree in Computer Science

Interested in hearing more? Easy Apply now by clicking the “Apply Now” button.
Apply Here
For Remote (Cloud) Site Reliability Engineer roles, visit Remote (Cloud) Site Reliability Engineer Roles


Java Site Reliability Engineer at Zyreoneconsulting LLC

Location: Austin

Job Role: Java Site Reliability Engineer

Location: Sunnyvale CA

Primary Skills: Java, J2EE, API development (Restful webservices), Splunk , Dynatrace, Logstash, ELK (Elastic Search), SQL, Unix, CI/CD, Jenkins, Click, Kibana OR Grafana.

Job Description:
• Proven working experience in Java, J2EE, Java/J2EE framework with strong experience in Api development (restful web services).
• Proven Expertise and hands on into Spring
• Strong hands on experience and expertise into Application Support, Maintenance & Deployment of CI/CD pipelines using Jenkins.
• Strong hands on with monitoring the deployed pipeline using SRE tools (site reliability tools) using Logstash OR Prometheus & Splunk , Dynatrace.
• Ability to provide and implement technical solutions to a wide range of difficult problems through on calls support.
• Strong hands on for logging & monitoring tools using Elastic Search (ELK), Kibana , CLICK or Grafana
• Ability to provide and implement technical solutions to a wide range of difficult problems using logging & Monitoring Tools.
Apply Here
For Remote Java Site Reliability Engineer roles, visit Remote Java Site Reliability Engineer Roles


Site Reliability Engineer – Ad Platforms at Apple

Location: Austin

• Summary At Apple, we work every day to build products that enrich people’s lives!
• Our Advertising Platforms group makes it possible for people around the world to easily access informative and imaginative content on their devices while helping publishers and developers promote and monetize their work.
• Today, our technology and services power advertising in Search Ads in the App Store and Apple News. Our platforms are highly-performant, deployed at scale, and setting new standards for enabling effective advertising while protecting user privacy.
• The Ad Platforms team is seeking a Senior Site Reliability Engineer for a phenomenal opportunity.
• Our mission is to enable Ad Platforms to deliver advertisements in a reliable and scalable way that results in incredible user experiences!
• Key Qualifications 2+ years experience supporting internet-facing production services and distributed systems.
• Good programming skills in one of Java or Python or Go. Expertise in operating Linux based systems, with a proven understanding of its internals.
• Experience in container platforms like Kubernetes.
• Experience building and running infrastructures on AWS, including using services like EKS, MSK.Experience in Infrastructure as a code like Terraform.
• Experience in leading the deep-dive and troubleshooting of production issues with an active diagnostic call.
• Demonstrated problem solving ability using creative and innovating thinking but also adhering to a strong sense of ownership, customer service, and integrity demonstrated through clear communication.
• Aim to be self-motivated, and eager to learn.
• Description – As a Site Reliability Engineer you will be responsible for providing the platform for critically important ad-tech systems to maintain constant uptime, scale seamlessly, and allow for new applications and services to thrive.
• – The successful candidate will be highly self-motivated with a passion for excellence, quality and detail.
• The SRE will not only support operations, but also work closely with the developers and architects within the team to aid in the design and assist with the implementation to improve stability, security and scalability.
• – Implement and improve our infrastructure and application monitoring and observability capabilities that results in improving our reliability.
• – Engage with application engineering teams to improve service operability and reliability, on-call efficiencies, drive incident management, and post-mortem analysis.
• – Drive production readiness, and improve key areas like capacity planning, configuration management, and observability- Design and improve architectures of new and existing systems based on the principles of reliability and high availability with extensive logging and observability.
• – Develop expertise in Apple Infrastructure and best practices and bring that to Ad Platforms to run a world class distributed systems.
• – Create tooling and automation to improve the operations and operability of our infrastructure and applications.
• Education & Experience -Bachelor’s degree in Computer Science/Engineering field or equivalent.
• Master’s degree preferred.
Apply Here
For Remote Site Reliability Engineer – Ad Platforms roles, visit Remote Site Reliability Engineer – Ad Platforms Roles


Site Reliability Engineer at Tesla, Inc in Austin, TX at Tesla, Inc

Location: Austin

Tesla, given its high-profile in the technology and automotive industries, is uniquely positioned to enable site reliability engineers to have a major impact within our industry, company, and across our customer base. We are currently looking for a Site Reliability Engineer to architect, build, manage and operate the Infrastructure & applications for Tesla’s Engineering teams including Connected Systems, Firmware and Autopilot. In this role, you will plan, design, implement as well as manage ongoing Infrastructure components that powers the development of Tesla’s Vehicles and technologies.
Work with the team to design, build, and maintain Infrastructure (Virtual, Bare-metal, Container based) for various Tesla Engineering teams Diagnose and troubleshoot complex distributed systems handling large volumes of data and develop solutions that have a significant impact at scale. Participate in building advanced tooling for testing, monitoring, administration, and operations of multiple clusters across multiple geographically distributed data centers Be ultimately accountable for the performance, capacity and high availability of the infrastructure Identify sources of manual work related to onboarding new projects from inception to CI/CD pipeline and strive to automate as much as possible. Reduce metrics around standing up dynamic development environments ranging from a micro-service to a complete stack. Reduce the time it takes to build, deploy, and configure Infrastructure & Applications in several environments. Lead problem resolution and coordination activities; Troubleshoot issues across the entire stack – hardware, software, application and network. Facilitate knowledge sharing by creating and maintaining comprehensive documentation & diagrams Strong team player with a high degree of self-motivation and the ability to learn new systems & manage additional technical resources to meet the project requirements Requirements:
BS degree in computer science or related engineering degree. 3
years of hands-on experience with private/public cloud computing, including infrastructure, storage, platforms and data management Experience with traditional enterprise data-center technologies, including compute, storage appliances, virtual machines, and networking 3
years of experience in general programming/scripting (Python and BASH) and automation frameworks such as Ansible to manage administration, monitoring and developing custom plug-ins and workflows Extensive experience in Linux system administration, automation & management Should be able to quickly identify the root cause and resolve critical issues by looking across multiple layers (storage, OS, network, virtualization, and application / DB stack) Strong documentation and communication skills Position will require being available to perform occasional maintenance and be available for on-call rotation during non-business hours and over the weekends. Available for occasional business travel
Salary Range:
$80K — $100K
Minimum Qualification
Systems Architecture & Engineering, DevOps & Site ReliabilityEstimated Salary: $20 to $28 per hour based on qualifications.
Apply Here
For Remote Site Reliability Engineer at Tesla, Inc in Austin, TX roles, visit Remote Site Reliability Engineer at Tesla, Inc in Austin, TX Roles


The Tech Career Guru
We will be happy to hear your thoughts

Leave a reply

Tech Jobs Here