Fulltime Site Reliability Engineer openings in Chicago, United States on September 14, 2022

Site Reliability Engineer – 3884 at Delphi-US

Location: Chicago

Job Description

Job Title: Site Reliability Engineer (Contract) – Job# 3884

Location: Chicago, IL (hybrid/remote)

Job Description:
Our client is actively seeking a Site Reliability Engineer to join their team on a contract-to-hire assignment.

• Collaborate with development, operations, and infrastructure teams to ensure availability of services, and to work through implementation issues.
• Develop automation for incident response and to prevent problem recurrence
• Create and enhance runbooks to respond to service outages or degradations
• Assess the production readiness of services
• Define and track operational metrics for production performance, reliability, scalability, and availability
• Architect, develop, and maintain shared services and tools to improve reliability and reduce toil across the organization
• Contribute to the team’s continuous improvement through research, retrospectives, discussion groups, and code reviews
• Provide support within the team by guiding and mentoring junior members, and preparing stories for the sprint backlog

• Experience with maintaining and troubleshooting large-scale distributed systems
• Strong experience with Agile / Scrum methodology
• Capable of succeeding in a fast-paced environment with frequent changes
• Comfortable communicating with both technical and non-technical audiences
• Strong documentation skills
• Analytical problem-solving approach
• Self-starter – takes the initiative to research, learn and deliver- Anticipates the play.
• Team player – collaborative, humble, and focused on making sure the entire team succeeds

Technical Skills:
• Demonstrated experience managing infrastructure in public cloud environments like AWS (preferred), Azure, or GCP
• Proficient in providing visibility using monitoring and alerting tools like Splunk, SignalFx, AppDynamics, Datadog, StackDriver, Sysdig, Prometheus or Grafana
• Programming/scripting experience in languages like Java, Bash, Python or Go
• Experience with distributed messaging systems like Kafka, RabbitMQ, or ActiveMQ
• Strong understanding of container orchestration systems like Kubernetes, Mesos, Docker Swarm, or Rancher
• Demonstrated involvement in Continuous Integration and Continuous Delivery (CI/CD) tools like Jenkins, Travis, Harness, Spinnaker, Appveyor, CodeBuild, or CodePipeline.

About Delphi-US
Delphi-US is a national recruiting firm based in Newport, Rhode Island. We specialize in IT, Engineering, and Professional Staffing services for organizations from Main Street to Wall Street. Our mission is simple: To connect great people to great companies. We accomplish this with a proprietary skill-based and cultural matching process that results in higher qualified submissions along with increased interviews and offer rates. You’ll find our team is friendly, professional, and ready to advocate on your behalf, armed with industry trends, and a dedication to client expectations.

Company Description
Delphi-US is a national recruiting firm based in Newport, Rhode Island. We specialize in IT, Engineering and Professional Staffing services for organizations from Main Street to Wall Street. Our mission is simple: To connect great people to great companies. We accomplish this with a proprietary skill-based and cultural matching process that results in higher qualified submissions along with increased interviews and offer rates. You’ll find our team is friendly, professional and ready to advocate on your behalf, armed with industry trends, and a dedication to client expectations.
• **Equal Opportunity Employer***
Apply Here
For Remote Site Reliability Engineer – 3884 roles, visit Remote Site Reliability Engineer – 3884 Roles


Site Reliability Engineer (Cloud Platform Operations) at Acronis

Location: Chicago

Acronis is a world leader in cyber protection—empowering people with cutting-edge technology that enables them to monitor, control, and protect the data that their businesses and lives depend on. We are in an exciting phase of rapid-growth and expansion and looking for a Site Reliability Engineer in Cloud Platform Operations who is ready to join us in creating a #CyberFit future and protecting the digital world!

We are seeking a talented Site Reliability Engineer in Cloud Platform Operations to evaluate and recommend new approaches to system administration, system engineering and automation, implement and execute automated scripts and tools to resolve recurring production issues with systems and services, as well as many other engaging tasks to help improve our team.

Every member of our “A-Team” has an instrumental role and impact on the success of Acronis’ innovative and growing business, so we are looking for someone who enjoys working in dynamic, global teams and thrives in a fast-paced and rapidly changing work environment. Just like everyone at Acronis, the ideal candidate will embody all of our company values: responsive, alert, detail-oriented, makes decisions, and never gives up.

What You’ll Do
• Implement and execute automated scripts to resolve recurring Production issues with systems and services
• Triage Production issues, take swift action to restore services and provide root cause analysis
• Perform Production deployments including configuration changes, installations, upgrades and updates
• Improve documentation on data center infrastructure and best practices including architecture diagrams, how-to-guides and run-books
• Evaluate and recommend new approaches to system administration, system engineering and automation
• Participate in on-call activities on a rotation basis

What You Bring (experience & Qualifications)
• In-depth knowledge of Linux
• System programming experience with Bash or Python
• Experience deploying and managing Kubernetes
• Familiarity with Ansible, ELK (Elastic, Logstash and Kibana), and Atlassian tools such as BitBucket, Confluence, Gliffy and JIRA
• Experience with large-scale Production environments maintaining high availability and scalability
• Knowledge of CICD pipelines and technologies including Jenkins
• Basic knowledge of TCP/IP including ports and protocols
• Good problem-solving, troubleshooting and analytical skills working with servers, hypervisors, VMs, containers, storage appliances and networks
• Work on multiple projects at the same time with tight deadlines
• Strong written and verbal communication skills, including the ability to create technical documentation and system diagrams
• Bachelor’s / Master’s degree in Computer Science or a related field preferred

Who We Are

Acronis is revolutionizing cyber protection by unifying backup, disaster recovery, storage, next-generation anti-malware, and protection management into one solution. This all-in-one integration removes the complexity and risks associated with non-integrated solutions and offers easy, complete and reliable data protection for all workloads, applications, and systems across any environment—all at a low and predictable cost.

Founded in Singapore in 2003 and incorporated in Switzerland in 2008, Acronis now has more than 2,000 employees and offices in 34 locations worldwide. Its solutions are trusted by more than 5.5 million home users and 500,000 companies, and top-tier professional sports teams. Acronis products are available through over 50,000 partners and service providers in over 150 countries and 26 languages.

Acronis is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, marital status, national origin, physical or mental disability, medical condition, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, gender identity or expression, or any other characteristic protected by applicable laws, regulations and ordinances.
Apply Here
For Remote Site Reliability Engineer (Cloud Platform Operations) roles, visit Remote Site Reliability Engineer (Cloud Platform Operations) Roles


Sr. Site Reliability Engineer – Remote at Jobot

Location: Chicago

Problem solvers, lifelong learners, curious minds and tech obsessed engineers needed! Come join us as we grow our world class NaaS (Networking as a Service) & Cloud Connectivity service and enjoy great culture, comp and benefits!

This Jobot Job is hosted by Brendan Thomas

Are you a fit? Easy Apply now by clicking the “Apply” button and sending us your resume.

Salary $175,000 – $220,000 per year

A Bit About Us

We’re an award-winning global NaaS & Cloud Storage Provider that offers the most innovative tools in the industry.

Our customers enjoy faster integration, lower fees and full control of where their data is stored + much more.

Want to change the future with us? Apply now!

Why join us?
• World class team of innovators, lifelong learners, and problem solvers
• Strong compensation up to $200k annually + Bonus + Stock Options
• Fully remote
• Health, dental, vision, and life insurance benefits
• 401k
• Unlimited PTO
• Paid Maternity/Paternity leave and more!

Job Details

• Work with Devs to troubleshoot issues and provide systems level and architecture support
• Expand configuration management systems with innovative features
• Support Devs to bring new software and services to the relevant device
• Solve complex system stability issues
• Recommend new technologies that would strengthen our development and systems
• Automate Everything!

• Python
• Linux
• Cassandra
• RabbitMQ
• Kafka
• Ansible
• Salt
• Terraform
• Chef
• Puppet
• Github
• Ubuntu
• Jenkins
• Docker
• Kubernetes
• Nginx
• Postgres

Interested in hearing more? Easy Apply now by clicking the “Apply” button.
Apply Here
For Remote Sr. Site Reliability Engineer – Remote roles, visit Remote Sr. Site Reliability Engineer – Remote Roles


Site Reliability Engineer at HealthEquity, Inc.

Location: Chicago

We are looking for a passionate Senior Site Reliability Engineer to join our team in Draper, Utah. Our team is responsible for driving scalable architecture, minimizing risks, providing visibility across a multitude of environments, systems and applications while using lean principles at scale in a fast-paced environment. Youll contribute to the design and documentation of systems, in collaboration with scrum teams, looking for opportunities to automate away waste. Youll work with scrum teams to troubleshoot complicated systems and applications and will partake in an on-call rotation.

What you’ll be doing
• Work with teams to design and implement automated code deployment solutions
• Work with teams to design and implement automated environment provisioning and container solutions
• Work with teams to design and implement application monitoring and alerting solutions to get issues to the right people at the right time
• Work with teams to remediate issues that impact the health and performance of our production systems and infrastructure
• Work with teams to diagnose and isolate issues at all layers of the stack, whether it be code or infrastructure, during development and in production
• Manage build definitions and hardware in support of our Continuous Delivery policies and procedures

What you will need to be successful
• Bachelors degree in CS/Engineering or equivalent experience
• 8+ years experience in a DevOps, SRE, or IT Operations position
• 2+ years experience writing SQL queries and Stored Procedures
• 2+ years experience developing in .NET and C#
• 4+ years experience developing in powershell scripts
• 4+ years experience in working with Terraform
• Demonstrated interpersonal skills and ability to collaborate with product owners and development teams
• Demonstrated ability to context switch while still delivering on commitments
• Ability to troubleshoot complex systems and environments
• Expert in migrating applications to Azure cloud by understanding the monolith architecture
• Experience with CI/CD concepts and tooling using Azure pipelines
• Expert in monitoring alerts on Splunk, Dynatrace, Azure App insights and AVI load balancers
• Documenting upgrades and software maintenance projects to build and accessible record for future requirements
• Extensive experience on 24×7 on call production support, troubleshooting, debugging and root cause analysis
• Strong Knowledge of TCP/IP Networking, SMTP, HTTP, Load Balancers, High available Network servers, S2S and Azure Express Routes
• Diagnose & repair issues using critical knowledge of Unix processes, MySQL and related technologies within the OSI stack
• Identify the priority and critically of incoming Alerts and prioritize appropriately and escalate issues to service network or operations engineers
• Demonstrate experience in languages such as Shell scripting, PHP, Bash, Java or C, Python is also a Plus
• Experience with Kubernetes to Orchestrate the development, scaling and management of Docker containers
• Experience with deploying Kubernetes applications with helm Charts, expertise in creating Kubernetes config maps, ingress and services
• Experience in writing Infrastructure as a code (IAC) in Terraform, Azure Resource Management. Create reusable terraform modules in Azure Cloud environment
• Hands on experience with different Azure Services such as Design and configure Azure virtual networks (VNets), Subnets, DHCP address blocks, DNS settings, Security policies and Routing
• Experience in Azure Cloud services, Blob storage, Active Directory, Azure Service Bus, and Cosmos DB
• Experience with building Azure VMs availability sets using Azure portal to provide resiliency for IaaS based solution and Virtual Machine scale sets (VMSS) using ARM to manage network traffic
• Demonstrate experience in creating Docker images using docker file, docker container snapshots and manage docker volumes to implement docker automation solution for CICD model
• Involve in developing APIs using Kubernetes to manage and specify the copies of containers to run the actual Servers on cloud. Schedule deploys and manage container replicas onto a node cluster using Kubernetes and manage microservices using its Nodes, Pods, ConfigMaps and selectors
• Knowledge in Scaled Agile process and planning and Azure DevOps concepts
• Knowledge of full stack monitoring concepts and tooling from code to system resources
• Experience with containerization design concepts and tooling

Benefits & Perks
• Medical, Dental, Vision
• HSA contribution and match
• Dependent Care FSA match
• Unlimited Paid Time Off
• 401(k) match
• Paid Parental Leave
• Ongoing Education?& Tuition Assistance
• Gym/Fitness Reimbursement
• Award Winning Wellness Program

Come be your authentic self

Why work for HealthEquity

HealthEquity has a vision that ? by?2030 we will make HSAs as wide-spread and popular as retirement accounts. ? We are passionate about providing a solution that allows American families to ? connect health and wealth . Join us and discover a work experience where the person is valued more than the position. Click here to learn more.

Come be your authentic self

HealthEquity, Inc. is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, age, color, religion, sex, sexual orientation, gender identity, national origin, status as a qualified individual with a disability, veteran status, or other legally protected characteristics. HealthEquity is a drug-free workplace.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Site Reliability Engineer at JPMorgan Chase Bank, N.A.

Location: Chicago

As a Site Reliability Engineer (SRE), you’ll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you’ll be focused on running better production applications and systems. Responsibilities: Design, code, test, and deliver software to automate manual operational work Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes Identify application patterns and analytics in support of better service level objectives Design self-healing and resiliency patterns Design automated software and product upgrades, change management, and release management solutions Coach or manage teams as applicable Participate in the 24×7 support coverage as needed Qualifications: Bachelor’s degree or equivalent experience in an software engineering discipline Expertise in at least one technology stack designing, coding, testing, and delivering software Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the firm Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks) Excellent debugging and trouble shooting skills JPMorgan Chase & Co., one of the oldest financial institutions, offers innovative financial solutions to millions of consumers, small businesses and many of the world’s most prominent corporate, institutional and government clients under the J.P. Morgan and Chase brands. Our history spans over 200 years and today we are a leader in investment banking, consumer and small business banking, commercial banking, financial transaction processing and asset management. We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. In accordance with applicable law, we make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as any mental health or physical disability needs. The health and safety of our colleagues, candidates, clients and communities has been a top priority in light of the COVID-19 pandemic. JPMorgan Chase was awarded the “WELL Health-Safety Rating” for all of our 6,200 locations globally based on our operational policies, maintenance protocols, stakeholder engagement and emergency plans to address a post-COVID-19 environment. As a part of our commitment to health and safety, we have implemented various COVID-related health and safety requirements for our workforce. Employees are expected to follow the Firm’s current COVID-19 or other infectious disease health and safety requirements, including local requirements. Requirements include sharing information including your vaccine card in the firm’s vaccine record tool, and may include mask wearing. Requirements may change in the future with the evolving public health landscape. JPMorgan Chase will consider accommodation requests as required by applicable law. Equal Opportunity Employer/Disability/Veterans
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


Senior Site Reliability Engineer at Uplight

Location: Chicago


The Position

Do you dream about creating a more sustainable future? At Uplight, we are motivating energy users and providers to accelerate the clean energy ecosystem. Working with over 90 of the world’s leading electric and gas utilities, Uplight provides an end-to-end customer energy experience. Uplight delivers personalized experiences that customers have now come to expect–improving satisfaction, increasing revenue, reducing the cost to serve, and contributing to carbon reduction goals. We are B Corp certified, enabling us to put our values into action by not only making decisions for the benefit of our shareholders, but also for our customers, environment, employees, and community.

We are seeking a Staff Site Reliability Engineer to join our team and help us achieve our ambitious goals for our business and the planet.

What you will contribute:

As our Senior Site Reliability Engineer, you will play an integral part in improving the efficiency and way we work in our infrastructure. Your strong collaboration skills will be required to work closely with other engineering teams to ensure services and systems are highly stable and performant, meeting the expectations of our Enterprise customers and users. You will play an important role in making better software and continuously improving the development, integration, and deployment processes. You are passionate about distributed systems and working with highly scalable services in a large multi-cloud environment. You have the ability to prioritize tasks and work independently, and you’re skilled in solving problems with code.
What you get to do:
• Provide guidance and expertise for infrastructure usage to application developers as part of Uplight’s Site Reliability Engineering team.
• You will be on a team focused on Automation, toil elimination, incident response, root cause analysis, and tooling improvement.
• Key to the role will be your ability to teach and mentor other engineers and establish strong relationships with application development customers.
• As a Site Reliability Engineer, you will identify and deliver software improvements using your expertise in software development, complexity analysis, and scalable system design.
• Instrument an observability platform that allows the entire engineering team to detect deviations in performance.
• Contribute to cloud architecture; design, development, and administration to ensure reliability and ease of use for our Application and Platform development teams.
• Guide other team members in the successful completion of technical initiatives as a technical leader.
• Contribute to and create the definition of patterns, best practices, and strategies for continuous integration and deployment pipelines for a variety of architectures for virtual machines, containers, and serverless infrastructure.
• Solve complex problems that require creativity and collaboration to find the right solution for Uplight.
• Enhance the developer experience using best-in-class tools to develop APIs for Infrastructure services.
• Work with others to define, evangelize, and maintain SRE (DevOps) best practices for use across Uplight.
• Collaboratively participate with team members to identify areas for improvement and design solutions.
• Cooperate with architects and product development teams to build reliable services.
• Identifying and leading opportunities to improve automation for the company; scoping and building automation for deployment, management, and visibility of our services

Skills and experience are necessary, but we hire on value alignment first, so if you feel you would be a good fit with us, still consider applying.

What you bring to Uplight:
• Expert level experience developing in a Cloud environment (AWS, Azure, Google)
• Advanced level of experience in shell scripting and development languages (Python, Golang, etc)
• Advanced level of experience with container orchestration tools and practices (Kubernetes, etc)
• Experience with configuration management tools such as Ansible
• Advanced level of experience with infrastructure provisioning tools (Terraform, Helm, etc)
• Experience architecting CI/CD pipelines
• Several years in a Software Development or SRE role

Bonus Points:
• GCP/AWS Professional certifications (Architect, DevOps, Developer)
• Experience with cloud infrastructure automation such as Terraform, Cloud Formation or similar
• You have experience in public speaking events
• Solid CI/CD experience
• Automation of multi-step repetitive tasks
• Serverless architecture, such as Google Cloud Run
• Experience with container orchestration tools and practices (Kubernetes, etc)
• You work on the command line confidently and are familiar with all that the linux toolkit can provide
• You are a Git guru and revel in collaborative workflows

What makes working at Uplight amazing:

In addition to all the standard medical and dental benefits, that kick in Day 1, we are:
• Proud to be over 500+ purpose-driven individuals helping to create a more sustainable planet.
• Committed to the environment, our employees, and our communities.
• Focused on career growth by following defined career ladders.
• Committed to taking our work and mission seriously and….we love to laugh!

We also provide:
• 401k Match
• Medical, vision, and dental insurance
• Monthly wellness stipend
• Peer to peer recognition program
• Management by objectives bonus plan
• Innovative flexible time off policy
• Parental Leave
• Exceptionally collaborative and cool office spaces (once we reopen them)

Salary Range: $130,000 to $150,000 USD

In accordance with the Colorado Equal Pay for Equal Work Act, the approximate annual base compensation range is listed above. The actual offer, reflecting the total compensation package and benefits, will be determined by a number of factors including the applicant’s experience, knowledge, skills, and abilities, as well as internal equity among our team.

Uplight provides equal employment opportunities to all employees and applicants and prohibits discrimination and harassment of any type without regard to race (including hair texture and hairstyles), color, religion (including head coverings), age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Senior Site Reliability Engineer at High Country Search Group

Location: Chicago

• Must be a US Citizen — Not currently accepting applicants requiring Visa Sponsorship*

About the job:

I’m working with a tight-knit company to hire a Lead Site Reliability Engineer who can help build the roadmap for the company’s technical growth and assist in their cloud transitioning …

About The Company:

The company I’m representing is a self-funded, stable “startup” that has NEVER needed to take money from outside investors, and they have NEVER laid anyone off. They have carved out a fair-sized market share in the payments space by specializing in small and medium-size businesses. They process more than $5 billion dollars annually for tens of thousands of small and medium businesses with solutions that make it easy to accept credit and debit card payments in-store, online, and on the go.

Salary Target for this role is $ 170,000 – $ 190,000

(100% Remote – Hiring From CA, IL, TX, MN, CO, & GA).

Strong candidates will come with:

— 7+ years relevant experience across either Development, DevOps, or SRE

— Has been the OWNER of an important system with an understanding of Networking (Firewall, Load Balancing, Routing, Switching)

— Strong Linux expertise

— Scripting Abilities (Bash, Perl, or Python)

— Technically led an SRE or DevOps team

— IaC tools (Terraform, CloudFormation, Ansible)

— Monitoring tools (Prometheus, Grafana)

What’s Next?

If this opportunity sounds like a match for your skills and experience, send in a copy of your resume And if this role is close-but-not-perfect, maybe one of the other 25+ roles I’m working on is — let’s connect
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Senior Site Reliability Engineer at Uplight, Inc.

Location: Chicago

Provide guidance and expertise for infrastructure usage to application developers as part of Uplight’s Site Reliability Engineering team. You will be on a team focused on Automation, toil elimination, incident response, root cause analysis, and tooli…Engineer, Reliability, Senior, Product Development, Architect, Technology, Automation
Apply Here
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles


Site Reliability Engineering Lead at Northern Trust Corporation

Location: Chicago

– General:
• Attend weekly team meetings.
• Submit time records at the end of each week.
• Undertake general tasks that may be allocated from time-to-time.
• Recurring Problem Diagnosis (RPD)
• The investigations will be based on our RPR problem diagnosis method which we will teach you.

The tasks and responsibilities are:
• Conduct Discovery Calls to obtain a problem statement, a high-level understanding of the moving parts of the system to investigate, how the data flows around the system, and the diagnostic data sources available.
• Produce a Diagnostic Capture Plan that describes how the data needed will be captured.
• Help app and infra people to execute the Diagnostic Capture Plan.
• Analyze the data that results to determine the root cause of the problem, or the next steps.
• Issue periodic email-based status reports.
• Attend investigation progress meetings with stakeholders.
• Notify the team leader of blockers or other issues that may arise.
• Assist other SREs in investigations.
• Handle multiple RPDs at any time – this is possible as there can be long pauses in the investigations.
• Undertake projects to improve our ability to solve problems.

Golden Signal Monitoring:
• Use Site Reliability Core (SRC) to identify app and infrastructure services that are missing their Service Availability target or in danger of doing so.
• For the services identified as having a problem, investigate using SRC and other data sources.
• Assess the underlying issue against criteria that we have establish and, where appropriate, create a ServiceNow problem record with details of the problem and assign to the service owner.
• Work with the team that owns the service to help them understand our findings and explain to vendors.
• Assist the service support team in determining the cause of the problem.
• Assist in the operation of our SRC system including onboarding of services, setting availability metrics and fine tuning SLOs.
• Undertake projects to improve our ability to monitor systems and deliver service availability information.

About Northern Trust:

Northern Trust provides innovative financial services and guidance to corporations, institutions and affluent families and individuals globally. With over 130 years of financial experience and nearly 20,000 partners, we serve the world’s most sophisticated clients using leading technology and exceptional service.

Working with Us:

As a Northern Trust partner, you will be part of a flexible and collaborative work culture, which has a strong history of financial strength and stability. Movement within the organization is encouraged, senior leaders are accessible, and you can take pride in working for a company that is committed to strengthening the communities we serve

We recognize the value of inclusion and diversity in culture, in thought, and in experience, which is why we are honored to receive the following awards in 2021:

Gender Equality Index Member, Bloomberg

Top Financial & Banking Company, Black EOE Journal, Hispanic Network Magazine, Professional WOMAN’S Magazine

We’d love to learn more about how your interests and experience could be a fit with one of America’s best banks and most sustainable companies Build your career with us and apply today.
Apply Here
For Remote Site Reliability Engineering Lead roles, visit Remote Site Reliability Engineering Lead Roles


Site Reliability Engineer at ICE

Location: Chicago

Job Description

Job Purpose

ICE Mortgage Technology is the leading cloud-based platform provider for the mortgage finance industry. ICE Mortgage Technology solutions enable lenders to originate more loans, reduce origination costs, and reduce the time to close, all while ensuring the highest levels of compliance, quality and efficiency.

This is an exciting opportunity for a SRE Engineer in the SRE Team to provide resilient and secure services. Design reliable, scalable and stable systems. Build actionable alerts/automation for preventing incidents and to detect performance bottlenecks. Quickly trouble-shoot issues to restore service.

• * Employ deep troubleshooting skills to improve the availability, performance, and security of Ellie Mae Services.
• Ensure services are designed with 24/7 availability and operational readiness and rigor
• Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
• Coding and Automation of Applications on Cloud Platform
• Define and measure KPIs and SLOs
• Implement automated deployments, automated tests, and operational tools
• Collaborate with Product and Support teams to plan and deploy product releases
• Set Strategic and Operational goals for team, and work with team to deliver on goals.
• Work with Cloud Operations leaders to develop narratives, backlog grooming, epic planning and overall sprint planning processes
• Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
• Partner with other SREs and lead by example – contributor more than a delegator

Knowledge and Experience
• * 5+ years of Systems/Applications automation in 24×7 Production Services environments
• BS in Computer Science, Computer Engineering, Math, or equivalent professional experience
• Fluency with one or more current generation scripting language used by DevOps professionals (Powershell, Python, Perl, PHP, Ruby) + Java/.NET development
• Excellent troubleshooter, utilizing a systematic problem-solving approach
• Demonstrated experience in designing, analyzing, and diagnosing large-scale distributed systems + Windows Server and/or Linux systems internals (system libraries, file systems, client-server protocols)
• Experience with elastically scalable, fault tolerance and other cloud architecture patterns
• Proven strength in SaaS services, experience in massive scale web operations
• Experience operating on AWS or other public Cloud (both PaaS and IaaS offerings)
• Experience with Continuous Integration and Continuous Delivery concepts.
• Infrastructure as code utilizing tools like Terraform, CloudFormation or Chef/SaltStack/Puppet/DSC
• Experience in Containerization/Docker/Micro-Services


This role is remote, and the employee may be located in one of the following states Arizona, California, Connecticut, Washington DC, Florida, Georgia, Illinois, Kansas, Maryland, Massachusetts, Michigan, Minnesota, Missouri, New Hampshire, New Jersey, New York, North Carolina, Ohio, Oklahoma, Oregon, Pennsylvania, South Carolina, Texas, Utah, Virginia, Wisconsin

Intercontinental Exchange, Inc. is an Equal Opportunity and Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, gender identity, national origin or ancestry, age, disability or veteran status, or other protected status.
Apply Here
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles


The Tech Career Guru
We will be happy to hear your thoughts

Leave a reply

Tech Jobs Here