Job Simplification
The announced job offer is made public by the firm: Atos and it was included in jobs list the date of: 2022-10-22 in the website greenenergyjobsonline.com.
It is announced that they have a job offer at the category of Engineering and the jobs location is in the state of Illinois at the city Chicago, in the country US - at this current ZipCode: 60290
The information residing in this page is not directly announced by our websites, we help users in the intent of finding the jobs they want and redirect them back to jobs8 for apropriate applying to any of the jobs listed on www.greenenergyjobsonline.com.
Job Overview
Join to our team!
Atos is the global leader in secure and decarbonized digital with a range of market-leading digital solutions along with consultancy services, digital security and decarbonization offerings.
• Worldwide Digital leader • €11 billion in Revenue • 105,000 employees • 71 countries • Olympic & Paralympic Games Worldwide Partner
We inspire candidates and our employees to make the right choices, collectively and individually, to shape the future of the information space.
SRE (Site Reliability Engineering) Job Description
Site Reliability Engineers (SREs) are responsible for keeping production systems running smoothly. SREs are a blend of pragmatic operators and software crafts people that apply engineering principles, operational discipline, and mature automation to our operating environments.
SREs specialize in systems (operating systems, networks, observability), while implementing best practices to continuously improve availability, reliability, and scalability.
As an SRE you will:
Develop and run SRE own tooling and observability using automation like CI/CD, and Kubernetes.
Build monitoring that alerts on symptoms rather than on outages.
Document every action so your findings turn into repeatable actions and then into automation.
Debug production issues across services and levels of the stack.
Plan the growth and reliability of services.
Use your on-call shift to prevent incidents from ever happening.
Be on an on-call rotation to respond to "Code Red" incidents to help restore customer impacting service.
You may be a fit for this role if you have some of these inclinations:
Have an urge for delivering quickly and effectively and iterating fast.
Think about systems: edge cases, failure modes, behaviors, specific implementations.
As an engineer, when you see something broken, you cannot help but fix it.
Have an urge to document all the things so you do not need to learn the same thing twice.
Strong knowledge of SDLC (System Development Life Cycle)
Strong knowledge of git, Docker, Kubernetes, Jenkins, AWS (Amazon Web Services) or similar technologies
Know what the use of configuration management systems like Chef, Ansible
Have strong programming skills in one or more of the following languages: C, Ruby, Python, Shell, Java
Good understanding of hybrid infrastructure
Projects you could work on:
Automation like CI/CD, self-healing of services, end-to-end or performance testing
Improve monitoring (data Dog, AppD etc.) and building new smart metrics
Develop a relationship with a product group and help define their SLO/SLI
Work directly with AppDev to improve product by Non-functional and production readiness
Improve operability, latency, capacity planning, change management and improve MTTR (Mean Time to Repair)
Leveling of Site Reliability Engineering
Technical
Configuration management: use Chef and Ansible to effectively manage our infrastructure
Infrastructure as code: use Terraform and GitLab CI/CD for automation, containerize our environments (Kubernetes), and leverage cloud technologies to meet our goals
Systems: manage, configure, and troubleshoot operating system issues, storage (block and object), networking VPC (Virtual Private Cloud), proxies and CDN (Content Delivery Network) and administer high-availability PostgreSQL and Redis clusters
Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations
Engineering practices: availability, reliability, and scalability, as well as disaster recovery
Use and contribute to code to git
Experience coding in one or more of the following languages: C, Ruby, Python, Shell, Java
Execution
Planning: familiar with agile methodologies; use epics and issues to drive projects
Organization: workload organization, OKR (Objective and Key Result) leadership
Management: a manager of one, able to self-organize and report asynchronously
Collaboration and Communication
Leading and contributing to scope and designs for issues, epics, and OKRs (Objective and Key Result)
Contributing to the Handbook, create and update runbooks, general documentation, and write blogs
Completing Root Cause Analysis (RCA) investigations and performing readiness reviews
Improving team practices through code reviews, handoffs of work and incidents
Influence and Maturity
Knowledge sharing, mentoring.
Self-awareness, handling conflict in the team, and providing and receiving feedback
Maintaining good relationships with other engineering teams that help improve the product
Accountability: willing to proactively step in and do the right thing while providing candid and constructive feedback
Page BreakLevels for Site Reliability Engineer
Site Reliability Engineer - 1
Technical
General knowledge of 4 technical expertise areas, with deep knowledge in 1 area
a. AWS Cloud Practitioner, resources provisioning and configuration through CLI/API
b. Chef (basic syntax, recipes, cookbooks) or Ansible (basic syntax, tasks, playbooks)
c. Working knowledge of CI/CD, Jenkins, Nexus, pipelines, jobs
d. Kubernetes basic understanding, CLI (Command Line Interface), service re-provisioning
e. Provision and setup metric in AppD or Grafana or Datadog
f. Provision and setup logs and queries for frequent questions
g. Networking VPC, proxies and CDN (Content Delivery Network)
Working knowledge of git
Execution
Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
Proposes ideas and solutions to debug, optimize code, and to automate tasks.
Plan, design and execute solutions within Card/Bank to reach specific goals agreed within the team.
Plan and execute configuration change operations both at the application and the infrastructure levels.
Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
Experience designing, analyzing, and debugging distributed systems
Collaboration and Communication
Self-organize through issues and epics
Improves documentation all around, either in application documentation, or in runbooks, explaining the why, not stopping with the what.
Root cause analysis and corrective actions
Influence and Maturity
Shares the learnings publicly through issues, runbooks, documentation, and blog posts.
Contributes to the hiring process in review questionnaires or being part of the interview team to qualify SRE candidates
Act as a reliability champion.
Levels for Site Reliability Engineer
Site Reliability Engineer - 2
Technical
General knowledge of most technical expertise areas, with deep knowledge in 2.
a. AWS Cloud Architect / Operations, resources provisioning and configuration via automation
b. Chef (advance syntax, recipes, cookbooks) or Ansible (advance syntax, tasks, playbooks)
c. Advance knowledge of CI/CD, Jenkins, Nexus, pipelines, jobs
d. Kubernetes: cluster provisioning and new services
e. Advance AppD or Grafana or Datadog monitoring rules
f. Log shipping pipelines and incident debugging visualizations
g. Advance Networking VPC, proxies and CDN
Contributes to Card/Bank codebase to resolve performance and observability issues
Hands on with creating self-healing and/or self-servicing solutions via automation and tooling
Execution
Identifies significant projects that result in substantial improvements in reliability, cost savings and/or revenue.
Identifies changes for the product architecture from the reliability, performance and availability perspectives with a data driven approach.
Influences the product roadmap and works with engineering and product counterparts to influence improved resiliency and reliability of the product.
Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage optimization
Identify parts of the system that do not scale, provide immediate palliative measures, and drive long term resolution of these incidents.
Identify Service Level Indicators (SLIs) that will align the team to meet the service level objectives.
Experience designing, analyzing, and debugging distributed systems
Collaboration and Communication:
Leads initiatives and problem definition and scoping, design, and planning through epics and initiatives.
Leverage experience and technical knowledge perform RCA / Incident Reviews and technical presentations
Perform and run blameless RCAs (Root Cause Analysis) on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again.
For stable counterpart assignments, maintain awareness and actively influence stakeholder planning and execution to improve product reliability
Act as a champion for reliability.
Influence and Maturity:
..... click apply for full job details
Atos is the global leader in secure and decarbonized digital with a range of market-leading digital solutions along with consultancy services, digital security and decarbonization offerings.
• Worldwide Digital leader • €11 billion in Revenue • 105,000 employees • 71 countries • Olympic & Paralympic Games Worldwide Partner
We inspire candidates and our employees to make the right choices, collectively and individually, to shape the future of the information space.
SRE (Site Reliability Engineering) Job Description
Site Reliability Engineers (SREs) are responsible for keeping production systems running smoothly. SREs are a blend of pragmatic operators and software crafts people that apply engineering principles, operational discipline, and mature automation to our operating environments.
SREs specialize in systems (operating systems, networks, observability), while implementing best practices to continuously improve availability, reliability, and scalability.
As an SRE you will:
Develop and run SRE own tooling and observability using automation like CI/CD, and Kubernetes.
Build monitoring that alerts on symptoms rather than on outages.
Document every action so your findings turn into repeatable actions and then into automation.
Debug production issues across services and levels of the stack.
Plan the growth and reliability of services.
Use your on-call shift to prevent incidents from ever happening.
Be on an on-call rotation to respond to "Code Red" incidents to help restore customer impacting service.
You may be a fit for this role if you have some of these inclinations:
Have an urge for delivering quickly and effectively and iterating fast.
Think about systems: edge cases, failure modes, behaviors, specific implementations.
As an engineer, when you see something broken, you cannot help but fix it.
Have an urge to document all the things so you do not need to learn the same thing twice.
Strong knowledge of SDLC (System Development Life Cycle)
Strong knowledge of git, Docker, Kubernetes, Jenkins, AWS (Amazon Web Services) or similar technologies
Know what the use of configuration management systems like Chef, Ansible
Have strong programming skills in one or more of the following languages: C, Ruby, Python, Shell, Java
Good understanding of hybrid infrastructure
Projects you could work on:
Automation like CI/CD, self-healing of services, end-to-end or performance testing
Improve monitoring (data Dog, AppD etc.) and building new smart metrics
Develop a relationship with a product group and help define their SLO/SLI
Work directly with AppDev to improve product by Non-functional and production readiness
Improve operability, latency, capacity planning, change management and improve MTTR (Mean Time to Repair)
Leveling of Site Reliability Engineering
Technical
Configuration management: use Chef and Ansible to effectively manage our infrastructure
Infrastructure as code: use Terraform and GitLab CI/CD for automation, containerize our environments (Kubernetes), and leverage cloud technologies to meet our goals
Systems: manage, configure, and troubleshoot operating system issues, storage (block and object), networking VPC (Virtual Private Cloud), proxies and CDN (Content Delivery Network) and administer high-availability PostgreSQL and Redis clusters
Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management and related system, and Slack/PagerDuty integrations
Engineering practices: availability, reliability, and scalability, as well as disaster recovery
Use and contribute to code to git
Experience coding in one or more of the following languages: C, Ruby, Python, Shell, Java
Execution
Planning: familiar with agile methodologies; use epics and issues to drive projects
Organization: workload organization, OKR (Objective and Key Result) leadership
Management: a manager of one, able to self-organize and report asynchronously
Collaboration and Communication
Leading and contributing to scope and designs for issues, epics, and OKRs (Objective and Key Result)
Contributing to the Handbook, create and update runbooks, general documentation, and write blogs
Completing Root Cause Analysis (RCA) investigations and performing readiness reviews
Improving team practices through code reviews, handoffs of work and incidents
Influence and Maturity
Knowledge sharing, mentoring.
Self-awareness, handling conflict in the team, and providing and receiving feedback
Maintaining good relationships with other engineering teams that help improve the product
Accountability: willing to proactively step in and do the right thing while providing candid and constructive feedback
Page BreakLevels for Site Reliability Engineer
Site Reliability Engineer - 1
Technical
General knowledge of 4 technical expertise areas, with deep knowledge in 1 area
a. AWS Cloud Practitioner, resources provisioning and configuration through CLI/API
b. Chef (basic syntax, recipes, cookbooks) or Ansible (basic syntax, tasks, playbooks)
c. Working knowledge of CI/CD, Jenkins, Nexus, pipelines, jobs
d. Kubernetes basic understanding, CLI (Command Line Interface), service re-provisioning
e. Provision and setup metric in AppD or Grafana or Datadog
f. Provision and setup logs and queries for frequent questions
g. Networking VPC, proxies and CDN (Content Delivery Network)
Working knowledge of git
Execution
Provides emergency response either by being on-call or by reacting to symptoms according to monitoring and escalation when needed
Proposes ideas and solutions to debug, optimize code, and to automate tasks.
Plan, design and execute solutions within Card/Bank to reach specific goals agreed within the team.
Plan and execute configuration change operations both at the application and the infrastructure levels.
Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation
Experience designing, analyzing, and debugging distributed systems
Collaboration and Communication
Self-organize through issues and epics
Improves documentation all around, either in application documentation, or in runbooks, explaining the why, not stopping with the what.
Root cause analysis and corrective actions
Influence and Maturity
Shares the learnings publicly through issues, runbooks, documentation, and blog posts.
Contributes to the hiring process in review questionnaires or being part of the interview team to qualify SRE candidates
Act as a reliability champion.
Levels for Site Reliability Engineer
Site Reliability Engineer - 2
Technical
General knowledge of most technical expertise areas, with deep knowledge in 2.
a. AWS Cloud Architect / Operations, resources provisioning and configuration via automation
b. Chef (advance syntax, recipes, cookbooks) or Ansible (advance syntax, tasks, playbooks)
c. Advance knowledge of CI/CD, Jenkins, Nexus, pipelines, jobs
d. Kubernetes: cluster provisioning and new services
e. Advance AppD or Grafana or Datadog monitoring rules
f. Log shipping pipelines and incident debugging visualizations
g. Advance Networking VPC, proxies and CDN
Contributes to Card/Bank codebase to resolve performance and observability issues
Hands on with creating self-healing and/or self-servicing solutions via automation and tooling
Execution
Identifies significant projects that result in substantial improvements in reliability, cost savings and/or revenue.
Identifies changes for the product architecture from the reliability, performance and availability perspectives with a data driven approach.
Influences the product roadmap and works with engineering and product counterparts to influence improved resiliency and reliability of the product.
Proactively work on the efficiency and capacity planning to set clear requirements and reduce the system resources usage optimization
Identify parts of the system that do not scale, provide immediate palliative measures, and drive long term resolution of these incidents.
Identify Service Level Indicators (SLIs) that will align the team to meet the service level objectives.
Experience designing, analyzing, and debugging distributed systems
Collaboration and Communication:
Leads initiatives and problem definition and scoping, design, and planning through epics and initiatives.
Leverage experience and technical knowledge perform RCA / Incident Reviews and technical presentations
Perform and run blameless RCAs (Root Cause Analysis) on incidents and outages aggressively looking for answers that will prevent the incident from ever happening again.
For stable counterpart assignments, maintain awareness and actively influence stakeholder planning and execution to improve product reliability
Act as a champion for reliability.
Influence and Maturity:
..... click apply for full job details
Employer Overview

Atos
Illinois, Chicago - 60290- Agency