Senior Site Reliability Engineer (Remote-Eligible) at Capital One
Center 1 (19052), United States of America, McLean, Virginia
Senior Site Reliability Engineer (Remote-Eligible)
Do you love building and pioneering in the technology space? Do you enjoy solving complex business problems in a fast-paced, collaborative, inclusive , and iterative delivery environment? At Capital One, you’ll be part of a big group of makers, breakers, doers and disruptors, who love to solve real problems and meet real customer needs. We are seeking Site Reliability Engineers who are passionate about delivering reliable, scalable, efficient, and highly available platforms. As a Site Reliability Engineer, you’ll have the opportunity to be on the forefront of driving a major transformation within Capital One.
What You’ll Do:
• Collaborate with and across Agile teams to design, develop, test, implement, and support technical solutions for opinionated container orchestration platforms
• Share your passion for staying on top of tech trends, experimenting with and learning new technologies, participating in internal & external technology communities, and mentoring other members of the engineering community
• Collaborate with digital product managers, data science, and tech partners to deliver robust cloud-based solutions that drive powerful experiences to help millions of Americans achieve financial empowerment through machine learning
• Utilize programming languages like Python and Golang, Container Orchestration services including Docker and Kubernetes, CM tools including Ansible and Terraform, and a variety of AWS tools and services
Capital One is open to hiring a Remote Employee for this opportunity.
• Bachelor’s degree
• At least 4 years of experience in DevOps Engineering (Internship experience does not apply)
• At least 2 years of experience with Cloud Native technologies (Amazon Web Services, Microsoft Azure, Google Cloud Platform)
• At least 2 years of Unix or Linux system administration experience
• 2+ years of experience with Terraform or Ansible, CI/CD, Git, and Jenkins
• 2+ years of experience with multi-tenant container orchestration platforms and services including Docker and Kubernetes
• 2+ years of experience with Kubernetes based cloud-native technologies such as argo, kubeflow, istio, linkerd, and dex.
• 2+ years of experience with coding and scripting (Python, Bash, SQL, and Golang or comparable languages)
• 2+ years of experience working with Agile Development Practices
At this time, Capital One will not sponsor a new applicant for employment authorization for this position.
No agencies please. Capital One is an Equal Opportunity Employer committed to diversity and inclusion in the workplace. All qualified applicants will receive consideration for employment without regard to sex, race, color, age, national origin, religion, physical and mental disability, genetic information, marital status, sexual orientation, gender identity/assignment, citizenship, pregnancy or maternity, protected veteran status, or any other status prohibited by applicable national, federal, state or local law. Capital One promotes a drug-free workplace. Capital One will consider for employment qualified applicants with a criminal history in a manner consistent with the requirements of applicable laws regarding criminal background inquiries, including, to the extent applicable, Article 23-A of the New York Correction Law; San Francisco, California Police Code Article 49, Sections ; New York City’s Fair Chance Act; Philadelphia’s Fair Criminal Records Screening Act; and other applicable federal, state, and local laws and regulations regarding criminal background inquiries.
If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation, please contact Capital One Recruiting at 1- or via email at . All information you provide will be kept confidential and will be used only to the extent required to provide needed reasonable accommodations.
For technical support or questions about Capital One’s recruiting process, please send an email to
Capital One does not provide, endorse nor guarantee and is not liable for third-party products, services, educational tools or other information available through this site.
Capital One Financial is made up of several different entities. Please note that any position posted in Canada is for Capital One Canada, any position posted in the United Kingdom is for Capital One Europe and any position posted in the Philippines is for Capital One Philippines Service Corp. (COPSSC).
For Remote Senior Site Reliability Engineer (Remote-Eligible) roles, visit Remote Senior Site Reliability Engineer (Remote-Eligible) Roles
IT: Site Reliability Engineer at Professional Diversity Network
Company OverviewArrowstreet Capital is a Boston-based systematic investment firm that manages global equity portfolios for institutional investors around the world. Our firm manages approximately $152 billion for over 232 client relationships.Job DescriptionThe Site Reliability Engineer (SRE) is responsible for providing stellar application support for Arrowstreet Capital’s daily business processing cycle. This individual will demonstrate a high-level of responsibility and consistency by contributing to the timely response and closure any incidents impacting this critical business function. In addition to supporting the daily production cycle, the PSA will perform releases to install, configure, troubleshoot and maintain our proprietary applications in Development, QA, UAT as well as Production environments. Participation in on-call support is required to rectify possible application outages, which means being available to share a rotating schedule with other members of the team to cover trading team hours of 4am to 7:30pm, EST. After hours and weekend work is occasionally required.Responsibilities Monitor and respond to data and application processing alerts/failures promptly and professionally; identify and fix gaps and ensure issues are tracked through to resolutionSeek opportunities for process improvements and makes recommendations. Partner with development teams to drive stability, operational excellence, and a culture of efficiencyEnsure team knowledge is current and forward-looking. Maintain awareness of new products as they are introduced and accurately document/update knowledge-base when applicableLiaise with external data providers to resolve connectivity and market data issuesReview, execute, and verify production changes in strict accordance with procedures defined in change documentsTake an active role in planned technology events, i.e. business continuity tests, ensuring recovery procedures are accurate and completeMaintain high-level of internal client satisfaction handlingIdentify issue trends and manage appropriately within leadership escalation pathQualificationsA minimum of 5 years’ experience supporting investment teams and their technology applicationsExperience with work flow automation/scheduling utilitiesExperience with data transfer and data load technologiesExperience with database logic and ability to query using SQLExperience with financial services processing and an understanding of financial termsExperience with developing and maintaining operational procedures with Confluence or similar toolingUnderstanding of Microsoft Operating Systems and Active Directory. Experience with RHEL is a plusExperience with the Change Management process and deployments; Jira Service Desk, VNext, Gitlab, Azure DevOps and similar toolsExcellent relationship-building skills with ability to interface with various internal support partners to derive solutions and negotiate changes to productionCritical thinker with the confidence to successfully identify gaps in customer settings and configurationsRemarkable interpersonal, verbal and written communication skills with an exceptional attention to detail when managing customer inquiriesStrong work ethic with a positive, team-player mentalityAbility to work hours required by rotation schedule, noted aboveWe maintain a friendly, team-oriented environment and place a high value on professionalism, attitude and initiative.PDN-95812136-8006-4b72-b951-47f918329808
For Remote IT: Site Reliability Engineer roles, visit Remote IT: Site Reliability Engineer Roles
Lead Site Reliability Engineer Opportunity with Boston, MA Based Pet SaaS Company at Motion Recruitment
Location: Boston, MA (Hybrid Remote)
Title: Lead Site Reliability Engineer
– Comfortable with Coding
Salary (% annual bonus):
Min: $160,000 base
Max: $180,000 base
For Remote Lead Site Reliability Engineer Opportunity with Boston, MA Based Pet SaaS Company roles, visit Remote Lead Site Reliability Engineer Opportunity with Boston, MA Based Pet SaaS Company Roles
Senior Site Reliability Engineer at Oracle
Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.
Work with Site Reliability Engineering (SRE) team on the shared full stack ownership of a collection of services and/or technology areas. Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services. Responsible for the design and delivery of the mission critical stack, with focus on security, resiliency, scale, and performance. Authority for end-to-end performance and operability. Partner with development teams in defining and implementing improvements in service architecture. Articulate technical characteristics of services and technology areas and guide Development Teams to engineer and add premier capabilities to the Oracle Cloud service portfolio. Understand and communicate the scale, capacity, security, performance attributes, and requirements of the service and technology stack. Demonstrate clear understanding of automation and orchestration principles. Act as ultimate escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs). Utilize a deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. Understand and explain the affect of product architecture decisions on distributed systems. Professional curiosity and a desire to a develop deep understanding of services and technologies.
A BS or MS in Computer Science, or equivalent. Identifies solutions to knowledge of server hardware and software configuration, networking, standard internet services, scripting languages, cloud computing patterns, technology security and compliance. Experience running large scale customer facing web services. Identifies solutions to understanding of load balancing technologies and experience with development in programming languages, databases and big data stores, and container technologies. Work involves defining and documenting technical architecture of complex and highly scalable products. A minimum of 5+ years experience of running large scale customer facing web services.
If you are a Colorado resident, Please Contact us or Email us at ~~~ to receive compensation and benefits information for this role. Please include this Job ID: 95493BR in the subject line of the email.
As a Senior Site Reliability Engineer, you will, provide direction in architecture discussions, design sessions and reviews with a focus on improving the reliability of the solution or platform. You will design and develop software to improve the availability, scalability, performance, stability, security and reliability of the solution or platform. You will design and develop software to improve the monitoring and operations for the solution or platform. You will conduct performance analysis and ensure accurate service capacity and demand forecasting. You will drive operational review, incident recovery and post-incident review to identify preventative actions and improvements with stakeholders. You will participate in on-call rotation for the operations of the solution or platform. You will develop strong relationships with solution teams to promote reliability culture and best practices.
• At least 8 years total combined higher education and related work experience, including:
• At least 2 years software engineering work experience
• At least 6 years higher education and/or additional work experience directly related to the duties of the job; including:
• Bachelor’s degree in; Computer Science, Computer Engineering, Software Engineering, Data Processing and/or in a related field
• Receipt of the appropriate government security clearance card applicable for your position
• Due to the client contract, you will be assigned, this position requires you to be a U.S. citizen
• New Relic
• Process improvement
• Willing to work additional or irregular hours as needed and allowed by local regulations
• Work in accordance with corporate and organizational security policies and procedures, understand personal role in safeguarding corporate and client assets, and take appropriate action to prevent and report any compromises of security within scope of position
• Perform other responsibilities as assigned
Diversity and Inclusion:
An Oracle career can span industries, roles, Countries and cultures, giving you the opportunity to flourish in new roles and innovate, while blending work life in. Oracle has thrived through 40+ years of change by innovating and operating with integrity while delivering for the top companies in almost every industry.
In order to nurture the talent that makes this happen, we are committed to an inclusive culture that celebrates and values diverse insights and perspectives, a workforce that inspires thought leadership and innovation. .
Oracle offers a highly competitive suite of Employee Benefits designed on the principles of parity, consistency, and affordability. The overall package includes certain core elements such as Medical, Life Insurance, access to Retirement Planning, and much more. We also encourage our employees to engage in the culture of giving back to the communities where we live and do business.
At Oracle, we believe that innovation starts with diversity and inclusion and to create the future we need talent from various backgrounds, perspectives, and abilities. We ensure that individuals with disabilities are provided reasonable accommodation to successfully participate in the job application, interview process, and in potential roles. to perform crucial job functions.
That’s why we’re committed to creating a workforce where all individuals can do their best work. It’s when everyone’s voice is heard and valued that we’re inspired to go beyond what’s been done before.
Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Oracle is an Equal Employment Opportunity Employer ***** . All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
• ** Which includes being a United States Affirmative Action Employer**
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles
Site Reliability Engineer at Alert Innovation
Alert Innovation is a fast-growing company on a mission to reinvent retailing through robotics. A diverse, driven, and creative team of professionals, we work daily to design systems that are changing the world of customer fulfillment. We’ve partnered with the world’s largest retailer to develop our Alphabot® technology, which is currently being deployed at stores throughout North America.
We’re looking for a Site Reliability Engineer who:
• Seeks to build – projects to products, systems thinking, modern SDLC pipelines
• Always seeks to minimize constraints and increase lead time, reduce MTTR and increase DEV’s experience.
• Has a developer-first mindset with a passion for solving Dev and Ops problems with code
• Strives for operational excellence, tearing down silos and always improving feedback loops
• Has a strong urge to automate all manual processes
• Embeds into product teams and run the production environments by monitoring SLO, SLI and SLA.
• Helps build software and systems to manage platform infrastructure and applications
• Improves reliability, quality, and time-to-market of our suite of software solutions
• Measures and optimizes system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
• Provides primary operational support and engineering for multiple large product engineering groups
• Shares our values, and work in accordance with those values.
What you’ll work on:
• Create the automated SDLC pipelines inside an embedded product team with a keen eye on pushing the bar on DEV experience and productivity.
• Building and setting up new development tools and infrastructure and training DEV teams to be self-service as a longer-term goal leveraging the standardized DevOps tooling.
• Help build out scalable infrastructure for the internal pipeline and tools.
• Support the Product groups as the local SRE to optimize systems performance and automation.
• Help build out the rigorous testing and automated quality gates required to accomplish continuous test.
• Balance an 25% DEV and 75% SRE workload balance within an operating agile/scrum team.
What we’re looking for:
• 1-2 years Development team experience
• 3 or more years DevOps or related experience (release engineering, QA, Ops, field, etc).
• Expertise in CI/CD toolchain of choice – Jenkins, Gitlab, artifactory, and test integration. Expert in CI/CD/CT
• Strong background in python
• Background working with Java or Golang (nice to have)
• 1+ years of Kubernetes and Docker
• Expertise in at least one config mgmt. tool (ansible, puppet, chef, etc…)
• Expertise in at least one scripting language of choice (with ability to learn others)
• Strong communication, and collaboration skills.
• B.S. in Computer Science or equivalent background experience.
What we offer:
• Highly competitive pay for the Boston market in which we’re based
• Great benefits — medical, dental, vision, life insurance, disability, 4% 401K match, flexible spending accounts, HRA’s, and up to 12 weeks of parental leave
• Unlimited paid time off — we encourage Alertians to take time off to re-energize and we trust our team to make choices that work for them and their team
• Company-wide holidays — we take time off as a company and recognize nine holidays throughout the year
• Equity — every full-time employee receives stock options, because you should have an ownership stake in what you build
• Learning & Development — we’ll reimburse tuition and learning opportunities both on and off the job
• Flexible Work Schedules — we offer a flexible work environment inclusive of remote and hybrid work schedules based on team requirements
• Employee Assistance Program — to assist with your emotional health, parenting, eldercare, nutrition, legal, and financial consultation needs
• Food and Fuel — we have the best coffee in the Boston area and it’s roasted by our founder! Lunch is provided daily for our team in the office and our kitchens are fully stocked with snacks and beverages
Learn more about why we were named a 2022 Best Place to Work at alertinnovation.com/careers.
Alert Innovation offers a safe work environment for its employees and partners. All employees, contractors, interns, and visitors are required to be fully vaccinated. Additional COVID precautions, such as wearing face masks, hand washing, and hand sanitizing are also common practices in all Alert Innovation facilities. We are a flexible work environment and offer on-site and hybrid work schedules based on team requirements.
Alert Innovation is proud to be an Equal Employment Opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We do not discriminate based upon race, religion, color, national origin, gender (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics.
For Remote Site Reliability Engineer roles, visit Remote Site Reliability Engineer Roles
Director, Site Reliability Engineering at Toast
Toast is driven by building the restaurant platform that helps restaurants adapt, take control, and get back to what they do best: building the businesses they love.
Bready * to make a change?
We are looking for a passionate and experienced engineering leader to join the Toast Platform to run our Site Reliability Engineering (SRE) team.
The Site Reliability Engineering team is responsible for running Toast’s production services with a commitment to quality, reliability, and low latency — without needing heroics. The team accomplishes this goal by enabling the rest of Toast R&D to do good design, conduct extensive testing, and establish repeatable processes. SRE helps up-level R&D by evangelizing and enforcing reliability practices across all teams. Finally, the team provides rapid response and resolution to incidents. This team
• Owns and drives Observability throughout the entire stack
• Facilitates accountability through SLOs across all lines of business
• Improves Service Resilience by enabling performance testing and chaos engineering
• Supports Core Services in production
• Provides incident response services for critical issues and facilitates blameless postmortems
About this roll * (Responsibilities)
As a Director of the Site Reliability Engineering team, you will be responsible for building out the Toast SRE team and will oversee the continuous improvement of the operational metrics for all of Toast’s systems. You will have the following tools at your disposal:
• Developing Toast’s SRE roadmap to optimize reliability and minimize MTTR
• Growing a global organization through hiring and creating professional growth opportunities
• Establishing strong working relationships with peer infrastructure and product teams
• Enabling and mentoring managers and engineers on your team to do the best work they can and rewarding their performance
• Influencing architecture decisions and patterns to optimize resilience and scalability throughout the entire R&D organization
Do you have the right ingredients* ? (Requirements)
• Experience managing multiple teams, including hiring and cross-functional collaboration
• Experience in a role with operational or production responsibility
• Deep understanding of systems, networking, and scaling issues
• Direct exposure to Cloud systems, ideally in an SRE/TechOps/DevOps/Production Engineering context
• Hands-on software development/troubleshooting background is a big plus
Our Spread of Total Rewards
• Unlimited Vacation
• Sabbatical opportunity after five years
• Professional Development Reimbursement Program
• Commitment to Employee Wellness through resources such as a quarterly Wellness Stipend
• Various peer and company recognition programs
• 401(k) and matching
• Medical, Dental, & Vision Coverage
• Mental Health Benefits
• Subsidized backup childcare
• Bread puns encouraged but not required
We are Toasters
Diversity, Equity, and Inclusion is Baked into our Recipe for Success.
At Toast our employees are our secret ingredient. When they are powered to succeed, Toast succeeds.
The restaurant industry is one of the most diverse industries. We embrace and are excited by this diversity, believing that only through authenticity, inclusivity, high standards of respect and trust, and leading with humility will we be able to achieve our goals.
Baking inclusive principles into our company and diversity into our design provides equitable opportunities for all and enhances our ability to be first in class in all aspects of our industry.
Bready* to make a change? Apply today
Toast is committed to creating an accessible and inclusive hiring process. As part of this commitment, we strive to provide reasonable accommodations for persons with disabilities to enable them to access the hiring process. If you need an accommodation to access the job application or interview process, please contact
For roles based in the United States: As part of our commitment to the health and safety of our employees and their families, all individuals entering our US workspaces are required to provide proof of full vaccination against COVID-19 unless they have an approved medical or religious accommodation.
For Remote Director, Site Reliability Engineering roles, visit Remote Director, Site Reliability Engineering Roles
Senior Site Reliability Engineer at Illumio in Remote at Illumio
Our Engineering team has established a culture based on thought leadership, independence, and responsibility. This powerful dynamic drives us forward as we work to make the digital world a safer place. Those who join us represent the leader in Zero Trust Segmentation and work on a technology stack that ranges from operating systems to distributed applications to UI and visualization. Together, we will continue to build world-class products–driven by people with different perspectives, backgrounds, and a commitment to innovation in a time when the world faces its greatest cybersecurity threats in history. WHAT YOU WILL ACCOMPLISH:
The SRE team is a growing team that partners closely with Engineering, Support, and IT. We are responsible for the design, deployment, and continuous operation of the Illumio ASP cloud ecosystem. We need your help to take our existing platform to the next level with CI/CD, automated diagnostics/scaling/healing, and more. Work on a team responsible for a blend of architecture, automation, development, and application administration Develop and deploy solutions from the infrastructure, to the network, and application layers, on public cloud platforms Exercise new product features before they’re delivered to our customers (we dogfood heavily) Ensure our SaaS platform is available and performing, and that we can notice problems before our customers Build the tools to improve speed, confidence and visibility of our SaaS deployments Help build security into every step of the software & infrastructure life cycle Collaborate with Support and Engineering on customer issues, as needed Participate in a periodic on-call rotation (Workload sustainability is important – we don’t want anyone burning out.) WHAT YOU WILL BRING:
A DevOps mentality Enjoy learning new tools, and languages Enjoy a collaborative environment Have a high attention to detail Have a strong customer focus Strives to solve traditional operations problems through automation Are willing to dig deep into infrastructure and code to solve problems A Bachelors in Computer Science, Engineering, MIS, or experience in software engineering or a related field An enthusiastic self-starter with a commitment to learning, customer empathy, and team communication 3
years’ experience deploying, tuning, and maintaining Linux-based, highly available, fault-tolerant web platforms in public cloud providers such as AWS, Azure, and GCP 3
years’ experience with a modern programming language. Experience with or a willingness to learn Ruby, Python and Linux shell scripting 3
years’ experience with common monitoring, log aggregation and metrics gathering platforms (Icinga, Sensu, Splunk, Telegraf/InfluxDB, et al.) 3
years’ experience with common database systems such as MySQL, PostgreSQL, Redis, or similar 3
years’ experience with common configuration management & orchestration tools like Chef, Ansible, and AWS services & APIs, or equivalents BONUS POINTS:
Experience speaking at industry conventions or meetups (Monitorama, SREcon, VelocityConf, DevOpsDays, etc.) WHO WE ARE:
Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk.#LI-KD1
$80K — $100K
DevOps & Site ReliabilityEstimated Salary: $20 to $28 per hour based on qualifications.
For Remote Senior Site Reliability Engineer at Illumio in Remote roles, visit Remote Senior Site Reliability Engineer at Illumio in Remote Roles
Senior Site Reliability Engineer – Opportunity for Working Remotely at VMware
Senior Site Reliability Engineer
The Elevator Pitch: Why will you enjoy this new opportunity?
You share a passion for supporting the development of software that has a significant impact on the world and the future of cloud computing. You love solving problems and learning new things and are looking for a company that helps enable those ideas. Technologies come and go, but that excites you because of the endless possibilities it creates. You are looking for an opportunity to work for a company whose software is utilized by every Fortune 500 company and significantly impacts every industry. You want to be a part of a collaborative environment whose teams care about the product they are creating, how they create it, and the impact it has on customers’ business objectives.
The Test Infrastructure team plays a pivotal role in Engineering, providing services and solutions for Dev/Test teams to develop and test Carbon Black Products with. We’re looking for a Senior Site Reliability Engineer who will deploy and maintain infrastructure and tooling for virtual, physical and public cloud Dev/Test labs. This is a unique opportunity to be part of a high-performing team and gain experience with cutting-edge tools and technologies.
Success in the Role: What are the performance goals over the first 6-12 months you will work toward completing?
• Deploy, support and maintain Dev/Test infrastructure and services in virtual and cloud environments
• Troubleshoot and resolve issues on Windows systems, Linux systems and Docker containers for services such as Jenkins, GitLab, Artifactory, Active Directory, DHCP, DNS, Citrix PVS, Horizon VDI, and much more.
• Monitor the health and performance of all infrastructure and resolve issues using strong troubleshooting skills
• Patch and maintain critical infrastructure servers
• Use configuration management tools like Ansible and Puppet to apply and maintain consistency across infrastructure services
• Support the extended engineering team in troubleshooting test automation failures caused by infrastructure issues
• Work closely with Dev/Test teams to understand their test infrastructure requirements and implement solutions based on their requirements
• Efficiently design and optimize solutions while maintaining a high level of security
• Leverage virtual datacenter and public cloud knowledge to provide Infrastructure as Code solutions for dev/test labs
• Identify and implement new automated solutions to improve agility and speed while eliminating manual tasks. Write automation and scripts to streamline common tasks, configurations, deployments, etc.
• Write Infrastructure as Code to deploy and configure infrastructure, services, and solutions
• Work with a geographically distributed team to drive customer use cases and outcomes
• Support the physical machine lab and datacenter by troubleshooting issues and deploying and configuring new physical machines such as servers, laptops and desktops (if the candidate is located within commutable distance of the Massachusetts office)
What You’ll Bring
• Strong Infrastructure troubleshooting skills with Windows, Linux and macOS systems, containers, service issues, network issues such as firewall, DHCP and DNS related issues, etc
• Solid understanding of OS fundamentals (Mac / Linux / Windows)
• Strong understanding of Docker and containers
• Experience with monitoring solutions for maintaining system health and availability and troubleshooting issues
• Solid analytical, automation and scripting skills (Python and PowerShell preferred)
• Experience with GitLab, Jenkins, Jenkins Pipelines and Artifactory
• Good understanding of virtualization and experience supporting virtualized environments
• Strong written and verbal communication skills
• Great attention to detail
• Security Experience a plus
• Solid understanding of networking is a plus
• AWS, Azure, Kubernetes and vSphere experience is a plus
What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis?
As a new member of the Test Infrastructure Team, you will:
• Write technical design documents and hold reviews
• Participate in all team scrum ceremonies
• Design, develop, support and maintain solutions required by Dev/Test teams
• Support and maintain all virtual, physical and cloud lab infrastructure
• Monitor and troubleshoot issues with infrastructure services
• Leverage Infrastructure as Code to automate infrastructure deployment and management using tools such as Terraform, Python, Ansible, Puppet, vRA etc.
• Participate in our on-call rotation, providing operational support to our infrastructure and services
• Communicate with internal consumers of the team’s solutions to gather requirements and facilitate the adoption of your solutions
Where is this role located?
This role is based out of VMware’s Burlington, Massachusetts office and is open to candidates in the surrounding area. Working remotely is fine but an ideal candidate is able to commute to Burlington, Massachusetts when needed to support the physical machine test lab.
What are the benefits and perks of working at VMware?
You and your loved ones will be supported with a competitive and comprehensive benefits package. Below are some highlights, or you can view the complete benefits package by visiting ~~~ .
• Employee Stock Purchase Plan
• Medical Coverage, Retirement, and Parental Leave Plans for All Family Types
• Generous Time Off Programs
• 40 hours of paid time to volunteer in your community
• Rethink’s Neurodiversity program to support parents raising children with learning or behavior challenges, or developmental disabilities
• Financial contributions to your ongoing development (conference participation, training, course work, etc.)
• Wellness reimbursement and online fitness and wellbeing classes
This job may require the candidate to travel and/or work from a facility that requires full vaccination prior to entry.
Category : Engineering and Technology
Subcategory: Software Engineering
Experience: Manager and Professional
Full Time/ Part Time: Full Time
Posted Date: 2022-09-16
VMware Company Overview: At VMware, we believe that software has the power to unlock new opportunities for people and our planet. We look beyond the barriers of compromise to engineer new ways to make technologies work together seamlessly. Our cloud, mobility, and security software form a flexible, consistent digital foundation for securely delivering the apps, services and experiences that are transforming business innovation around the globe. At the core of what we do are our people who deeply value execution, passion, integrity, customers, and community. Shape what’s possible today at ~~~.
Equal Employment Opportunity Statement: VMware is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: VMware is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at VMware are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, social or ethnic origin, sex (including pregnancy), age, physical, mental or sensory disability, HIV Status, sexual orientation, gender identity and/or expression, marital, civil union or domestic partnership status, past or present military service, family medical history or genetic information, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. VMware will not tolerate discrimination or harassment based on any of these characteristics. VMware encourages applicants of all ages. Vmware will provide reasonable accommodation to employees who have protected disabilities consistent with local law.
For Remote Senior Site Reliability Engineer – Opportunity for Working Remotely roles, visit Remote Senior Site Reliability Engineer – Opportunity for Working Remotely Roles
Senior Site Reliability Engineer at Chewy
Chewy is seeking Senior Site Reliability Engineer in Boston, MA or Minneapolis, MN. Chewy is THE go-to online shopping destination for all things pet and we are continuously striving to delight pet parents with a seamless experience across our platforms. The SRE team works with various teams across the organization to make their service more resilient against failures through applying common patterns and practices, and scale them up to keep up with the ever-increasing growth and demand. This includes facilitating resiliency testing, game day exercises and chaos testing to uncover risks and weaknesses before they lead to large scale production issues.
Do you enjoy working in a fast-paced environment, solving complex technical problems, and delivering innovative solutions? If you have a passion for solving complex problems unique to running large, highly scalable, resilient systems, we would love for you to join us. The role will have tremendous visibility in the technology & business organization of Chewy. This is a high-profile position that will have exposure across the entire business, influencing the vision and implementation of architecture, design and features of Chewy’s technical platform.
What You’ll Do:
Contribute to the development of our self-service chaos platform.
Enable engineering teams to make their services more reliable by identifying, creating, and deploying engineering practices, processes, and solutions.
Establish monitoring tools and management dashboards integrated into platforms with best practice notifications and response processes.
Define and document best practices and strategies regarding application deployment and infrastructure maintenance.
Educate teams on the implementation of new cloud-based initiatives, providing associated training as required.
Employ exceptional problem-solving skills, with the ability to see and solve issues before they affect business productivity.
Improve availability, reliability, and observability of Chewy services and reduce the burden of human toil with tooling and automation.
What You’ll Need:
7+ years of experience in software engineering, SRE or performance engineering role.
5+ years of hands-on experience designing and developing scalable, high performing and fault-tolerant applications for large enterprises.
Expertise in developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.).
Hands-on working experience with issue tracking tools and source control systems (GitHub).
Experience with Infrastructure tools, container technology (Docker), public cloud providers (AWS, Google Cloud, Azure), configuration and deployment management (Terraform, Ansible), continuous delivery infrastructure (eg, Jenkins) and orchestration (Kubernetes, Fargate).
Excellent understanding of micro-services architecture, design patterns, and standard methodologies with an eye towards scale, automation, resiliency, and high availability.
Experienced with telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana.
Leverage automation to improve deployments and updates, speed up problem detection/resolution, and ensure safe and quick rollback when problems occur.
A Bachelor’s degree in Computer Science or related field or equivalent experience.
Position may require travel.
CDN & DNS experience is a plus.
Incident management and on-call experience.
Experience contributing to the architecture and design (architecture, design patterns, resiliency and scaling) of new and current systems.
Expertise in ITSM process & tools like JIRA, PagerDuty and experience with ServiceNow ITOM, ITSM Modules that focuses Incident, Problem and Change Management.
For Remote Senior Site Reliability Engineer roles, visit Remote Senior Site Reliability Engineer Roles
Senior Site Reliability Engineer at Draft Kings in Boston, MA at Draft Kings
Building the possibilities. We’re growing rapidly. As a Senior Site Reliability Engineer, you will help us continue running our applications smoothly as our business scales. DraftKings solves some of the most interesting challenges in the tech industry, and when you join our team, you’ll have the opportunity to see your ideas and solutions directly impact our products. What you’ll do as a Site Reliability Engineer:
Create self-provisioning infrastructure using tools like Chef, Terraform, and Docker. Define key metrics and SLAs around new web services being created to support our rapid traffic growth. You will design and implement monitoring and alerting strategies to enforce application SLAs Create platform-as-a-service environments where entire subsets of our architecture can be created and destroyed cleanly and reliably in AWS using Hashicorp tools like Packer, Terraform, Vault, Consul. You will also foster a continuous deployment ecosystem that will allow DraftKings to operate at a massive scale. Build serverless functions, build slack bots, work with bleeding edge technology, and have the freedom to learn as much as you like Facilitate developers and continuous development of our microservices architecture What skills you will use:
You will have 3
years with cloud environments and provisioning automation Deep understanding of common scripting languages (Python, Bash, Powershell). Experience working with at least one object-oriented language (Java, .Net, etc.) Working knowledge of networking and web concepts and ability to debug issues down to the packets. Experience with distributed systems and the challenges with operating them as they scale. Understanding of CI/CD pipelines. This position can sit in either Boston or remote; however, this work is to be performed entirely outside of Colorado. WHO ARE WE A GOOD FIT FOR? We love working with talented people but more than that, we seek out compassionate co-workers with a collaborative spirit. Our work moves quickly and we’re great at coming together to find creative solutions to some of tech’s most interesting problems. If that sounds good to you, join us. WE ARE DRAFTKINGS. We’re inspired by our shared passion for developing creative solutions to complex challenges and empowering the people around us to do their best work. We are industry leaders in the digital entertainment and technology space propelled by constant curiosity and diverse perspectives. Our teams are fueled by innovation. We are looking ahead, building what’s next, and continuously reinventing the industry. We’re a publicly traded (NASDAQ:
DKNG) technology company headquartered in Boston, with teams around the world and an expanding global presence.
$80K — $100K
DevOps & Site ReliabilityEstimated Salary: $20 to $28 per hour based on qualifications.
For Remote Senior Site Reliability Engineer at Draft Kings in Boston, MA roles, visit Remote Senior Site Reliability Engineer at Draft Kings in Boston, MA Roles