Remote Site Reliability Engineer openings in New York, United States on September 07, 2022

Site Reliability Engineer at GeoPhy

Location: New York

GeoPhy is a technology company in the real estate space. We provide property valuations engineered for the modern world, giving property lenders and investors fast, consistent and reliable access to information. Our technology allows our customers to understand value and its drivers by using both traditional and unconventional sources, using machine learning to create the most accurate valuations in the market.

GeoPhy’s multidisciplinary teams consist of data scientists, engineers, statisticians and economists, using data science and supervised machine learning to optimize the unprecedented volume and variety of data now available in the real estate sector.

We are looking for a Site Reliability Engineer to help build out, maintain, and troubleshoot our rapidly expanding infrastructure on AWS. You will be part of a team creating and maintaining mission critical infrastructure, ensuring availability, performance and security. We use AWS, Kubernetes, and Terraform to automate and scale our platform.

The impact you will have
• Design, setup and manage our AWS Services including the fleet of EC2 instances, VPC’s, AWS lambda etc. using Terraform for Infrastructure as Code
• Design, setup and manage deployment environments for Java, Ruby, PHP and Python
• Continually automate, monitor and improve our deployment processes
• Design, setup and manage Docker/Kubernetes
• Identify opportunities and bring solutions to day-to-day challenges
• Explore the AWS product suite to support our business (AWS Neptune, AWS SageMaker etc)
• Contribute in a positive way to the atmosphere within the team
• Databases: PostgreSQL, MySQL, ElasticSearch
• Applications: Java, PHP, Ruby, Python, NodeJS
• Web content delivery: Nginx, AWS Cloudfront
• On call duties

What we’re looking for
• Solid AWS knowledge
• Strong knowledge and experience with Kubernetes
• Ability to build reliable and scalable infrastructure via IaC (Terraform)
• Ability to advise technical teams through design and implementation, with a focus on Kubernetes
• Experience in Public Cloud deployments on AWS
• Linux System administration experience
• CLI scripting skills (Bash)
• Working knowledge of system monitoring tools- especially Prometheus
• Independently driven, proactive, accountable, and reliable team player
• Full working proficiency in English
• International mind-set

Bonus points for
• Knowledge on virtualization and containerization (Docker, Kubernetes)
• Knowledge of: Networking, SELinux
• Experience with: CICD tools, Maven, SonarQube, Kafka

What we’re offering
• You will have the opportunity to accelerate our rapidly growing organization
• We’re a lean team, so your impact will be felt immediately
• Agile working environment with flexible working hours and location, career advancement, and competitive compensation package
• GeoPhy is a family and pet friendly company
• We arrange social activities to help our employees and families become familiar with each other and our culture
• Diverse, unique colleagues from every corner of the world

If you’re convinced you are the right fit and you can’t wait to join our team, we look forward to hearing from you!
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles

********

Senior Site Reliability Engineer at Carta

Location: New York

The Company You’ll Join

Carta is a platform that helps people manage equity, build businesses, and invest in the companies of tomorrow. Our mission is to unlock the power of equity ownership for more people in more places.

Carta is trusted by more than 30,000 companies and over half a million employees in nearly 150 countries to manage cap tables, compensation, and valuations. Carta also supports over 5,000 funds representing over $100B in assets under administration with their venture capital solutions. Carta’s liquidity solutions have returned $13B to shareholders in secondary transactions. Today Carta’s platform manages over two trillion dollars in equity for nearly two million people globally. Companies and funds like Canva, Tribe, and Pipe build their businesses on Carta.

The company has been included on the Forbes World’s Best Cloud Companies, Fast Company’s Most Innovative list, and Inc.’s Fastest-Growing Private Companies. For more information, visit carta.com.

The Team You’ll Work With

The Site Reliability Engineering team (SRE) at Carta is responsible for ensuring the availability, reliability, and resiliency of the Carta app and other production systems in various environments. The team has expertise in systems architecture and design, infrastructure automation using Terraform, AWS and Kubernetes. In addition, the SRE team collaborates closely with the Information Security team on defining secure network boundaries and implementing security policies.

The Problems You’ll Solve
• Develop and maintain Terraform configs, Jenkins pipelines, Kubernetes manifest files as infrastructure as code (IaC) and extend these configurations to support new services, features and multiple environments.
• Solve complex dependencies of critical services of various business units and build automation to prevent future problems. Develop automation scripts to streamline system upgrades and pipelines to improve deployment cycle.
• Maximize and maintain high availability of systems and services while ensuring critical business functions are meeting their SLOs.
• Influence new designs and architecture, best practices and standards in supporting and improving technology platforms.
• Establish monitoring and alerting of production systems and critical applications.
• Participate in our on-call rotation to resolve site incidents and document your findings into repeatable runbooks as part of improving site availability.
• Work cross functionally with a passion to improve developer productivity.
About You

About

We’re optimizing for strong senior engineers with at least 4+ years of relevant experience who are excited about the opportunities to work with a fast moving team, as well as previous experience working with

You will be part of a cross functional team of engineers and product managers, and successful candidates will have extremely high EQ and IQ, with a strong bias towards collaboration .
• Hosting distributed systems on a public cloud providers (GCP or AWS)
• Containerization technologies (specifically, Docker, Kubernetes, Helm )
• Building and working with scalable infrastructures using Linux and Docker containers
• Automation via “infrastructure as code” (using tools like Terraform, Ansible, etc.) and writing scripts in Python and Bash
• GitHub and advanced understanding of CI/CD tooling (Jenkins, CircleCI)
• Production systems monitoring using tools such as Datadog, Grafana etc.

You’ll build reliable infrastructure via code for the Carta app to run on Kubernetes serving sensitive financial data. You will provide performance metrics visibility into the systems and applications via Datadog monitoring. You will leverage your prior experience in designing, building and maintaining infrastructure with reliability as core principle to reduce service failures as it pertains to site performance and availability. You will lead by example to demonstrate team collaboration in timely execution of planned projects enabling swifter delivery of software. You are pragmatic in making tradeoffs between different designs to optimize overall business value and are passionate to elevate the team as part of sharing knowledge and teaching. You have a desire to understand and solve people’s problems instead of simply fulfilling the requests.

We are an equal opportunity employer and are committed to providing a positive interview experience for every candidate. If accommodations due to a disability or medical condition are needed, connect with us via email at recruiting@carta.com . As a company, we value fairness, helpfulness, transparency, leadership and build our teams around these values. Check out our careers page to get to know us better as you think about your next step at Carta.

We are an equal opportunity employer and are committed to providing a positive interview experience for every candidate. If accommodations due to a disability or medical condition are needed, connect with us via email at recruiting@carta.com . As a company, we value fairness, helpfulness, transparency, leadership and build our teams around these values. Check out our careers page to get to know us better as you think about your next step at Carta.
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles

********

Site Reliability Engineer at Persado

Location: New York

Who We Are

Description

Persado is the only Motivation AI platform that enables personalized communications at scale to immediately inspire each individual to engage and act. Organizations that use Persado reach a tipping point in their ability to understand their customer, generating powerful, on-brand content and communications that drive value.

As an employer, Persado is committed to creating a place where everyone’s unique perspective is valued. We understand that our team members and our inclusive culture are what make Persado special. Persado is proud to be named on Fast Company’s World’s Most Innovative Companies list in 2020 and Built In’s Best Places To Work in 2021 & 2022.

What We Are Looking For

Persado is looking for a Site Reliability Engineer to work on maintaining and improving both customer-facing and internal systems from an efficiency and resiliency perspective. (EST or CST business hours)

What You Will Work On
• Help us ship existing and new product functionality in our SaaS products using tools such as Python, Kubernetes, AWS, etc., and make sure their performance is in alignment with business goals and trade-offs,
• Free up resources and reduce waste by automating repetitive tasks,
• Diagnose and mitigate problems related to reliability and performance, and learn how to reduce the risk of failure,
• Communicate and share knowledge with other teams and individuals, helping your colleagues grow their skillset,
• Invest in your career and your personal growth, with the help of the company’s learning and development budget, by studying subjects of interest, attending events, and generally taking care of yourself

What You Bring
• A commitment to achieving win-win outcomes across different disciplines
• Good writing skills in English
• 2 years minimum experience working in an engineering team in a technical capacity
• Experience writing production quality code, in at least one language (includes scripting languages)
• Ability to troubleshoot issues with Unix/Linux servers and networking
• Experience with effective usage of data storage systems (RDBMS, Key-Value, Warehouses, Object stores, etc.)

Also Appreciated
• Experience with configuration management (e.g. configuration-as-code)
• Experience with container orchestration (e.g. Kubernetes)
• Monitoring technologies such as Nagios and Prometheus
• Cloud computing platforms such as AWS, Azure and GCP

What We Offer

Achieve your life goals and work goals at Persado.
• Persado’s hybrid working model empowers both remote and in-office work equitably!
• Competitive and equitable compensation
• Generous benefits packages globally
• 401k matching (USA); Pension Scheme (Certain EU locations) to prepare for your future
• We encourage professional growth through our dedicated enablement and training teams, as well as on demand tools and resources
• $1250 Employee Enrichment Fund to pursue a passion or upgrade your home office!
• Structured onboarding program to ensure a confident start and long-term success for new hires!
• Strong emphasis on career development and mobility, continuous feedback loops and performance management
• Flexible time off to support work-life harmony (including Summer Fridays)
• #PersadoCares! 2 paid Volunteer days per year and $100 charitable donation match
• Robust Diversity, Inclusion and Belonging initiatives; culture month celebrations, monthly diverse speaker series, commitment to bias-free recruitment, ERGs (#culture, #mindsmatter, #parents, #women, #green, #pride and growing)!
• Recognition, Rewards and Ideas to Action programs to recognize the contributions and impact of Persadoans across the globe!

Valuing diversity at Persado means recognizing and respecting human differences and similarities. Persado is committed to diversity with respect to all aspects of employment. All decisions regarding recruitment, hiring, promotion, compensation, employee training and development, and all other terms and conditions of employment, will be made without regard to race, religious beliefs, color, gender identity, sexual orientation, marital status, physical and mental disability, age, ancestry or place of origin.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles

********

Site Reliability Engineer at Rokt

Location: New York

About Rokt

Description

Rokt is the global leader in ecommerce technology, helping companies seize the full potential of every transaction moment to grow revenue and acquire new customers at scale. Live Nation, Groupon, Staples, Lands’ End, Fanatics, UrbanStems, GoDaddy, Vistaprint and HelloFresh are among the more than 2,500 leading global businesses and advertisers that are using Rokt’s solutions to drive more value through every transaction by offering highly relevant messages to their customers at the moment they are most likely to convert.

With our December 2021 Series E raise of USD$325M, Rokt is expanding rapidly and globally – operating in 19 countries across North America, Europe and the Asia-Pacific region with the largest office in NYC and a major R&D hub in Sydney. With annual revenues of more than US$200M and vibrant company culture, Rokt has been listed in ‘Great Places to Work’ in the US and Australia. Our award-winning culture is guided by our five core values: Smart with Humility, Own the Outcomes, Force for Good, Conquer New Frontiers, and Enjoy the Ride. These values help us attract, engage, and develop the right talent around the globe and ensure we have the right conditions to do our best work. Keen to join a fast-growing company and a vibrant culture? Learn more at rokt.com.

The Rokt engineering team builds best-in-class ecommerce technology that provides personalized and relevant experiences for customers globally and empowers marketers with sophisticated, AI-driven tooling to better understand consumers. Our bespoke platform handles millions of transactions per day and considers billions of data points which give engineers the opportunity to build technology at scale, collaborate across teams and gain exposure to a wide range of technology. We are expanding rapidly in our major R&D centers in NYC and Sydney. We are passionate about using intelligent systems to improve the transaction moment for retailers everywhere. Come join us and build the future

The Role

As a Site Reliability Engineer you will be part of a team responsible for designing and building high levels of availability, scalability and reliability into our systems. You will become intimate with the architecture of our systems and be responsible for diving deep into code, assist with architecture and root cause analysis workshops working directly with feature teams.

Responsibilities
• Design, develop, test, deploy and improve code that solves real world problems
• Manage priorites, deadlines and deliverables
• Operate with autonomy in solving problem
• Collaborate with other teams
• Engage in and improve services—from inception and design, deployment and in use
• Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
• Scale systems sustainably through automation
• Evolve systems by pushing for changes that improve reliability and latency

Requirements
• Bachelor’s degree or equivalent practical experience.
• 3 years hands-on experience in Site Reliability and Observability Engineering, debugging, diagnosing and correcting errors and resolving high severity incidents
• Commercial experience in one of the following languages Java, C#, Python or Go.
• Think about systems – edge cases, failure modes, behaviors, specific implementations.
• You have hands-on development experience with cloud infrastructure and tooling (AWS, GCE, Azure, Kubernetes, Docker, CI/CD pipelines & Terraform).
• Understanding of Defensive programming, Circuit breakers, Resilience frameworks, Fault tolerance, and self-healing mechanisms of services.
• Experience working on various monitoring, and alerting tools
• Strong organizational and interpersonal skills
• You have handled multiple on call shifts, and have navigated more than one incident through to the retrospective process.
• At Rokt we encourage autonomy; teams have complete ownership of their systems including building, running and monitoring. As such, you may be required to be on-call and respond to systems alerts should they arise.
• Ideas, opinions, and the ability to share them through respectful proposals, presentations, and team-wide discussions, An eagerness to work and learn in the open and share your learnings with your teammates.
• A willingness and comfort communicating remotely through chat, docs, video calls, and other collaborative online tools

Benefits
• Force for Good. We actively invest in the growth of our people and the strengthening of our communities. Our NYC office is 100% vaccinated to keep our employees and community safe and healthy. We require all Rokt’stars as well as anyone else who will be onsite at the Rokt NYC office – clients, contractors, vendors, and suppliers – to show proof of vaccination and their booster shot.
• Work with the greatest talent in town. Our recruiting process is tough. We hold a high bar because we have a high-performing, high-velocity culture – we only want the brightest and the best.
• Join a community. We believe the best things happen when we come together to solve complex problems and make meaningful connections with each other through interest groups, sports clubs, and social events.
• Accelerate your career. Develop through our global training events, ‘Level Up’ investment, online training courses, and our fantastic people leaders. Take your career to Rokt’speed – Grow your career in our rapidly growing company.
• Take a break. When you work hard, we know you also need to rest. We offer generous time off and parental leave policies, as well as mental health and wellness days for all employees. We also offer a paid Rokt’star Sabbatical for employees who have been with us for 3 years or more.
• Stay happy and healthy. Enjoy catered lunch 3 times a week and healthy snacks in the office. Plus join the gym on us In the US, access generous retirement plans like a 4% dollar-for-dollar 401K matching plan and get fully funded premium health insurance for your whole family. And our NYC office is dog-friendly
• Become a shareholder. All Rokt’stars have stock options. If we succeed, everyone enjoys the upside.
• See the world Along with our global all-staff events in amazing locations (Phuket, Thailand in January 2020, Hawaii in May 2022), we also offer generous relocation packages for those interested in moving to another Rokt office. We have cool offices in great cities – New York, Sydney, London, Singapore, Tokyo.
• Get the best of both worlds with a hybrid workplace. We currently work 3 days a week in office, allowing you to enjoy the best of both worlds (please note: this is subject to change based on the needs of the business and some support roles still require a full time presence). One week per quarter, you also have the flexibility to work from anywhere.
• We believe in equality. Rokt is an Equal Opportunity Employer and recognizes that a diverse workforce is crucial to our success as a business. We would love you to apply for one of our open roles – irrespective of socio-economic status or background, age, gender identity, race, religion, sexual orientation, color, pregnancy, carer/family responsibilities, national and social origin, political opinion, marital, veteran, or disability status

Salary range: $140,000 – $180,000 / year
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles

********

Site Reliability Engineer // 100% Remote// AZURE at Motion Recruitment

Location: New York

Senior Site Reliability Engineer (SRE)

REMOTE USA

This RSG has a portfolio of brands that span multiple industries providing customers with innovative technologies, information-based analytics, decision tools and data services that help their clients reduce risk and improve decisions to benefit people around the globe.

This company is in search of a Senior SRE with proven industry experience to join our SRE group. We are looking for someone experienced in building infrastructure code, software services (PaaS and SaaS) including security policies, in order to migrate all of LNRS applications running in our primary location to Microsoft Azure.

Qualifications:
• 2+ years experience in DevOps
• Comfortability using Terraform
• Knowledge of Kubernetes
• Experienced with Azure

Main Responsibilities:
• Ensuring that our Database migration from our data center to Microsoft Azure is a success
• Writing and maintaining systems and database documentation for technical and non-technical audiences

Benefits of this role:
• The opportunity to work on a full range of challenging and interesting technologies and help to conquer some of the next generation of problems in risk data analytics
• Get to make a real difference to our customers and society
• Help us to continually evolve and modernize our Technology stack, including contributing to our technology radar and the evolution of our products
• The opportunity to learn a multitude of technologies, including but not limited to: AWS, Azure, Docker, Ansible, and Terraform
Apply Here
For Remote Site Reliability Engineer // 100% Remote// AZURE roles, visit Remote Site Reliability Engineer // 100% Remote// AZURE Roles

********

Site Reliability Engineer at Jobot

Location: New York

Site Reliability Engineer- Growing Company!

This Jobot Job is hosted by: Mary Lee
Are you a fit? Easy Apply now by clicking the “Apply Now” button and sending us your resume.
Salary: $100,000 – $170,000 per year

A bit about us:

We are the leader in people first search advertising that is looking for a SRE to implement observability for legacy infrastructure within Kubernetes (EKS Fargate) in AWS. You will have the ability to create our infrastructure workflow/coding standards as well as our observability standards across all of our products.

Why join us?
• Huge Room for Growth
• Great Pay and Benefits
• Work/life Balance
• 100% Remote

Job Details

Responsibilities
• Support on-prem infrastructure
• Work closely with software engineers to implement observability within on-prem and cloud-based environments
• Scale our infrastructure using an infrastructure-as-code mindset
• Join the on-call rotation to support infrastructure
• Solve complex infrastructure challenges related to low-latency large-scale distributed systems
• Create and maintain documentation for runbooks, implementations and infrastructure
• Create automation tools for the infrastructure team as well as software engineers
• Work with the infrastructure team to migrate on-prem infrastructure to a cloud solution

Qualifications and Skills
Must have
• BS in Engineering/Computer Science or relevant work experience in the field
• 4+ years of experience as a Systems or DevOps Engineer
• 2+ years of experience with container orchestration
• Strong knowledge of Unix-based systems
• Strong knowledge of Terraform
• Scripting experience with BASH/Python or the like
• Experience with modern observability tools and the implementation thereof
• Excellent documentation, communication and troubleshooting skills
• A zeal for coding excellence
• Ability to initiate and complete projects with minimal guidance
• Ability to collaborate with others who may not share your technical opinions
• Willingness to learn and support old architectures

Nice to have
• Experience with Kubernetes
• Experience with EKS Fargate
• Experience with Datadog
• Experience migrating on-prem infrastructure to the cloud
• Experience with Puppet/Ansible
• Experience with routers/switches
• Experience with monitoring/Log collection tools such as Nagios, Prometheus, Grafana, Graylog, Logstash, Kibana and Filebeat

Interested in hearing more? Easy Apply now by clicking the “Apply Now” button.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles

********

Site Reliability Engineer at Altana Technologies

Location: New York

To solve climate change, wealth inequality, supply chain stability, and national security we must change how our global supply chains work. Altana is a Trusted Commerce Platform built on a shared source of truth for the global supply chain. The purpose of Altana is to enable resilient, sustainable, secure, and inclusive global commerce: Globalization 2.0.

We have built a layer of shared intelligence across the world’s supply chain information: a living map of trillions of dollars of B2B commercial activity, covering 400M companies connected by billions of shipments. This knowledge graph powers Altana’s Trusted Commerce Platform – the Altana Atlas – which, after only three years since founding, is already used by many of the world’s most important governments, enterprises, and logistics providers.

Our product suite enables our customers to gain unprecedented visibility, benefit from shared artificial intelligence across a federated network of data, and interact across the network through a shared source of truth. We help our customers to build and manage trusted global supply chains.

The Engineering team is looking for talented Software Engineers to help build this vision. You’ll work closely with our Data Scientists on projects to analyze and observe world-scale datasets, write code that can scale to produce never before seen insights, and construct APIs to deliver our product vision.

This position can be worked remotely, but you should be comfortable working on New York time.

Responsibilities
• Use Terraform to build and operate multi-cloud environments (AWS, Azure, etc)
• Write Python code for automation and developer tooling
• Write and manage helm charts and ArgoCD manifests for multiple Kubernetes deployments
• Enable monitoring and observability tooling across our team’s entire stack
• Maintain security standards, role-based access controls, and other compliance-related needs
• Be responsible for automating, testing, and deploying your work
• Collaborate with fellow engineers and data scientists across the organization

About You
• BS or MS degree in Computer Science, Data Science, or equivalent experience
• You have 5+ years of real-world professional experience building developer tools or infrastructure automation
• You have a track record of ownership and delivery of projects with major organizational impact
• You care deeply about engineering excellence, clean code, and knowledge-sharing
• You have strong written and verbal communication skills

Nice To Have, But Not Required
• Experience with Python Machine Learning toolsets (Scikit-learn, Numpy, Pandas, Dedupe)
• Experience with API development
• Experience with GitHub Actions, GitLab CI, or other CI/CD tooling

Technologies we love
• Languages: Python, Go, Javascript
• Tools: Docker, Git, Kubernetes, Swagger/OpenAPI, AWS, Azure
• Datastores: Elasticsearch, Postgres, Redshift, Neo4j

Salary Range: $100,000 – $190,000 (Negotiable and dependent on experience)

Why it’s great to work at Altana
• We love to collaborate, and we win as a team
• We are committed to engineering excellence
• We value personal and professional development
• We learn from diverse backgrounds and perspectives
• We impact the world, from enabling developing countries to identifying drug traffickers

Altana is an equal opportunity employer with a commitment to inclusion across race and ethnicity, gender, sexual orientation, age, religion, physical ability, veteran status, and national origin. We offer a comprehensive healthcare package and paid parental leave of 3 months for the primary caregiver and 1 month for the secondary caregiver.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles

********

Site Reliability Engineer SRE at Request Technology

Location: New York

Salary Information:120-150K + BonusReference #:CJ-SREnjTravel:nullVisa Requirement:US Citizenship / Permanent ResidentRecruiter Email:Recruiter Name:Craig JohnsonLocation Type:Berkeley Heights, New JerseyOverview
• **We are unable to sponsor for this permanent full-time role***
• **Position is bonus eligible***

Prestigious Fortune 500 Company is currently seeking a Site Reliability Engineer. Candidate will be providing operational support for the products to meet SLOs and SLAs.

Responsibilities:

Working closely with development teams to implement and improve SLIs and SLOs for their services.
Identifying and developing processes, tools, automation, infrastructure improvements and software changes to address top operational issues.
Exerting technical influence to shape the implementation of products and establishing strong operational readiness across teams.
Utilizing hands-on technical skills to partner with team members and be comfortable diving into the fray as needed.
Diagnosing complex problems, developing metrics to measure them, and implementing monitoring solutions to manage them.
Building automation and systems to maintain software and hardware lifecycle management.
Using your programming experience to reduce toil.
Qualifications:

5+ years in a Reliability Engineering, DevOps, or Infrastructure focused role
Strong Experience scripting with a scripting language – Python, PowerShell, Javascript
Passion for designing and building reliable systems
Automation advocate – Have an automation first approach
Experience with deploying, supporting, and monitoring new and existing services, platforms, and application stacks
Strong experience supporting customer-facing applications on Windows/Linux platforms.
Monitoring experience leveraging Splunk, ExtraHop, Prometheus, etc…
Strong fundamental understanding of Networking and Security
Knowledge of TCP/IP networking, architecture, and core technologies (such as DNS, DHCP, HTTPS).
Excellent communication skills, written and verbal, to share your knowledge, teach what you know, and learn new ways of doing things from your team.
Preferred Skills:

Demonstrated experience building or maintaining highly available systems at scale.
Experience with CI/CD pipelines that support a SaaS product.
Experience with capacity planning practices or methodologies.

Tagged as: bash, container engineer, devops engineer, docker, kubernetes, linux, linux administrator, linux engineer, linux systems administrator, python, shell, site reliability engineer, SRE, unix
Apply Here
For Remote Site Reliability Engineer SRE roles, visit Remote Site Reliability Engineer SRE Roles

********

Site Reliability Engineer (Cadence) at Instaclustr

Location: New York

Be a part of our growing, talented and highly motivated team
Enjoy a flexible work environment
Work with a business that shares your love of all things Open Source

Instaclustr announced its acquisitions by NASDAQ giant, NetApp, and are now operating under the umbrella of, Spot by NetApp. This is an amazing advancement in our ability to provide career growth and opportunities for all our employees.

Instaclustr was recently named as one of Deloitte’s 2020 and 2021 fast 500 companies, after back to back monumental growth years. Since our foundation in 2013, Instaclustr has seen significant advancement with over 200 large scale customers spread across the globe and over 7500+ servers under our management. We have offices in Australia, the USA, and Europe and employ over 300 people worldwide. Keen to find out more? Read on.

The Role

Our TechOps Engineers are the frontline team keeping our large fleet of cloud-hosted Apache Kafka, Cassandra, Elasticsearch, Redis, Spark and PostgreSQL clusters up and running. Every day you will diagnose and solve challenging and interesting technical problems providing a service that is relied on by some of the leading global names in tech to deliver for millions of end users.

The Site Reliability Engineer (Cadence) role is focused on helping us to shape and deliver new offerings for Cadence.

I’m interested. What else will I be doing?

Working with our Managed Service product development team to establish Cadence operational requirements and support procedures for our Cadence offering.
Responding to customer queries and incidents, diagnosing and solving complex technical issues by liaising with customer’s engineers on primarily Cadence. This will include written communication via support tickets and occasional video-call based support. The role will also provide an opportunity to gain knowledge on Apache Cassandra, Kafka, Elasticsearch, PostgreSQL and other supported technologies.
Assist/mentor Level-1 team members to develop their technical capabilities in Cadence
Undertake complex cluster operations such as migrations, upgrades and maintenance on our fleet of 7500+ nodes
Provide expert operational support to our nodes running in the cloud (AWS/Azure/GCP), using technologies such as Linux (Debian), Docker, and languages including Java, Python and bash.
Investigate issues and apply standard maintenance procedures to optimise the performance and stability of production systems
Liaise with the development team through all stages of the development cycle to ensure proper release processes/procedures are being followed
Develop and continually improve our suite of internal automation tools, applications, and processes
Be a proactive, reliable and supportive member of the TechOps team, and participate in a rotating shift roster

Skills & Experience:
We’re looking for smart engineers with exceptional communication skills, a positive attitude, and a passion for IT and learning new things. We expect you to be, or quickly become proficient in the range of technologies we use.
You must have at least 1 to 2 years working experience in addition to:

Installing and managing a large fleet of Cadence clusters
Production environment familiarity on how to tune the operating system (tcp, kernel options, etc) to get the best out of Cadence and how to monitor Cadence clusters (preferably experience using Prometheus, Grafana, other Cadence monitoring/management tools)
Strong Linux skills with experience in cloud environments is a must, preferably AWS or GCP or Azure. Should be comfortable working from the command line. This is essential, there are no GUIs here.
Good fundamental computer science / software engineering skills and knowledge, particularly operating system internals, memory management, and networking.
Ideally, programming skills in languages such as Python, Java, bash scripting, SQL, ansible and source code control using Git.
Exceptional ability to communicate clearly and professionally in written and verbal English (essential).
Follow required processes and procedures.
Work as part of a team and use your initiative to get things done.
Passion for all things IT, and especially open source.
Any customer service experience is favourable.

What’s in it for you!

Generous benefits including:

Free private health care
Quarterly wellness days
401k
5 volunteer days per year to give back to the community
Donate employer funds to a charity of your choice
Disability benefits
Employee Stock Purchase Programs

Workplace flexibility and great work-life balance
A fantastic team environment
Opportunities to learn from incredibly talented people
Extra monetary contribution to further your professional studies
Exciting and fast-growing industry
Company SWAG
A well defined career model and opportunities for progression

_Trust, Team and Tenacity. These are the core values that stand at the forefront of who we are. Our values enable people from all walks of life to work together in an environment that fosters belonging and empowerment. We take immense pride in our diverse team and continue to set the standard throughout the Open Source community._

_Being unique is powerful. We promote an environment where you can bring your whole self to work each and every day._

U.S. Residents Only: In accordance with NetApp’s Policy, all U.S. employees of NetApp must be fully vaccinated against COVID-19 if they work at a Company location or remotely. If there is a reason preventing you from receiving the COVID-19 vaccination, you must request and be approved for one of the legally acceptable exemptions and reasonable accommodation must be established.

Job Type: Full-time
Apply Here
For Remote Site Reliability Engineer (Cadence) roles, visit Remote Site Reliability Engineer (Cadence) Roles

********

Site Reliability Engineer // 100% Remote// AZURE at Motion Recruitment

Location: New York

Senior Site Reliability Engineer (SRE)

REMOTE USA

This RSG has a portfolio of brands that span multiple industries providing customers with innovative technologies, information-based analytics, decision tools and data services that help their clients reduce risk and improve decisions to benefit people around the globe.

This company is in search of a Senior SRE with proven industry experience to join our SRE group. We are looking for someone experienced in building infrastructure code, software services (PaaS and SaaS) including security policies, in order to migrate all of LNRS applications running in our primary location to Microsoft Azure.

Qualifications
• 2+ years experience in DevOps
• Comfortability using Terraform
• Knowledge of Kubernetes
• Experienced with Azure

Main Responsibilities
• Ensuring that our Database migration from our data center to Microsoft Azure is a success
• Writing and maintaining systems and database documentation for technical and non-technical audiences

Benefits Of This Role
• The opportunity to work on a full range of challenging and interesting technologies and help to conquer some of the next generation of problems in risk data analytics
• Get to make a real difference to our customers and society
• Help us to continually evolve and modernize our Technology stack, including contributing to our technology radar and the evolution of our products
• The opportunity to learn a multitude of technologies, including but not limited to: AWS, Azure, Docker, Ansible, and Terraform

Posted By: Drew Longmore
Apply Here
For Remote Site Reliability Engineer // 100% Remote// AZURE roles, visit Remote Site Reliability Engineer // 100% Remote// AZURE Roles

********

The Tech Career Guru
We will be happy to hear your thoughts

Leave a reply

Tech Jobs Here
Logo