Senior Site Reliability Engineer

Company: Las Vegas Sands
Location: Dallas
Posted on: April 1, 2026

Job Description:

Job Description: Position Overview The primary responsibility of the Senior Site Reliability Engineer (SRE) to lead reliability engineering initiatives across our Azure estate and Command Center operations. This role focuses on scripting, automation, and observability to ensure uptime, performance, and rapid incident response. The Senior SRE will design and implement monitoring-as-code, optimize alerting, and build self-healing automation that reduces toil and accelerates recovery. As part of our journey from traditional operations toward a mature SRE model, the Senior SRE will partner with product engineering, platform teams, and the Command Center including Service Desk and Major Incident Command (MIC) to deliver measurable improvements in service reliability. All duties are to be performed in accordance with departmental and Las Vegas Sands Corp.’s policies, practices, and procedures. All Las Vegas Sands Corp. Team Members are expected to conduct and carry themselves in a professional manner at all times. Team Members are required to observe the company’s standards, work requirements and rules of conduct. Essential Duties & Responsibilities Observability & Monitoring Architect end-to-end monitoring using Azure Monitor, Log Analytics, Application Insights, and ITRS Geneos. Implement monitoring-as-code with Terraform/Bicep, including alerts, dashboards, and diagnostic settings. Create actionable dashboards (Azure Workbooks, Grafana) for SLIs/SLOs and real-time service health. Alerting & Incident Response Design alert taxonomies with severity mapping (P0–P4), dynamic thresholds, and escalation policies. Reduce alert noise and ensure 100% alert-to-runbook mapping. Support Major Incident Command (MIC) during P0/P1 bridges with technical expertise and rapid remediation. Automation & Tooling Build automation using PowerShell, Python, and Azure Functions for alert lifecycle, runbooks, and self-healing workflows. Integrate with ITSM (ServiceNow/Jira) for automated ticket enrichment and routing. Eliminate repetitive operational tasks and reduce toil through automation-first practices. Reliability Engineering Define and enforce SLIs/SLOs, error budgets, and resilience patterns (bulkheads, retries, timeouts). Conduct production readiness reviews, chaos drills, and failover rehearsals. Partner with app teams to embed instrumentation and structured logging. Governance & Compliance Enforce desired state with Azure Policy, DSC/Guest Configuration, and drift detection. Harden networking (VNet, NSGs, Private Link, Firewall), identity (Entra ID), and secrets (Key Vault). Ensure auditability and compliance across environments. Perform job duties in a safe manner. Attend work as scheduled on a consistent and regular basis. Perform other related duties as assigned. Minimum Qualifications At least 21 years of age. Proof of authorization to work in the United States. Bachelor’s degree in Computer Science or IT field, or equivalent experience. Must be able to obtain and maintain any certification or license, as required by law or policy. 7 years of experience in SRE/DevOps/Platform roles, with 4 years focused on Azure in production at scale. Expert knowledge in Infrastructure as Code (Terraform or Bicep) and Git-based workflows (GitHub Actions/Azure DevOps). Proficiency in CI/CD, deployment strategies (canary, blue-green), and automated rollbacks. Proficiency in PowerShell and Python for automation; experience building reusable modules. Demonstrated experience with AKS, App Services, Functions, VM Scale Sets, and Azure networking/security. Deep knowledge of: Azure: AKS, App Services, Functions, VMSS, Storage, Front Door, API Management, Load Balancers, Monitor, Log Analytics, App Insights, Key Vault, Policy, Defender Automation & IaC: Terraform/Bicep, PowerShell, Python, GitHub Actions/Azure DevOps Observability: Azure Monitor, Log Analytics, App Insights, Prometheus/OpenTelemetry; experience with ITRS Geneos. Service Management: ServiceNow, Jira Proficiency in SRE fundamentals: SLIs/SLOs, error budgets, capacity planning, chaos testing, and toil reduction. Demonstrated experience leading incidents and collaborating across teams. Strong interpersonal skills with the ability to communicate effectively and interact appropriately with management, other Team Members and outside contacts of different backgrounds and levels of experience. Must be available to work varied shifts including nights, weekends, and holidays, to ensure 24/7 coverage. Provide off-hours support on an infrequent, but as needed basis during critical incidents. (Potential shifts may run 24/7 due to the need of the business.) Team Members are required to be on site within the IT Command Center. Preferred Qualifications Certifications & Training AZ-400: Azure DevOps Engineer Expert AZ-305: Azure Solutions Architect Expert or AZ-104: Azure Administrator AZ-500: Azure Security Engineer Associate ITIL v4 for operational rigor SRE Foundation/Practitioner Certification (DevOps Institute or equivalent) Physical Requirements Must be able to: Lift or carry 50 pounds, unassisted, in the performance of specific tasks, as assigned. Physically access assigned workspace areas with or without reasonable accommodation. Work indoors and be exposed to various environmental factors such as, but not limited to, CRT, noise, and dust. Utilize laptop and standard keyboard to perform essential functions of the job.

Keywords: Las Vegas Sands, Grand Prairie , Senior Site Reliability Engineer, IT / Software / Systems , Dallas, Texas

Didn't find what you're looking for? Search again!

Let Dallas recruiters find you. Post your resume for free!

Get Dallas IT / Software / Systems jobs via email.

View more Grand Prairie IT / Software / Systems jobs

Other IT / Software / Systems Jobs

Tech - L1 Server Repair Dept. VIII - CU8200
Description: Description Summary The Server Repair Tech will be responsible for testing, trouble shooting, and replacement of parts on servers sent to SMS for diagnosis and repair. Responsibility Daily Responsibilities (more...)
Company: Sms Infocomm Corporation
Location: Grapevine
Posted on: 04/3/2026

Remote Finance Associate - AI Trainer ($50-$60 per hour)
Description: DataAnnotation is committed to creating high-quality AI. Join our team to help train the next generation of AI while enjoying the flexibility of remote work and the freedom to set your own schedule. This (more...)
Company: Data Annotation
Location: Dallas
Posted on: 04/2/2026

Seeking Veterans to Serve Veterans
Description: Seeking: enthusiastic, hard-working, friendly individuals to come support a huge network of veterans. This position relies on outstanding people skills and the desire to uphold our mission to protect (more...)
Company: AO Garcia Agency
Location: Frisco
Posted on: 04/3/2026

Salary in Grand Prairie, Texas Area | More details for Grand Prairie, Texas Jobs |Salary

Remote Real Estate Investment Associate - AI Trainer ($50-$60 per hour)
Description: DataAnnotation is committed to creating high-quality AI. Join our team to help train the next generation of AI while enjoying the flexibility of remote work and the freedom to set your own schedule. This (more...)
Company: Data Annotation
Location: Woodway
Posted on: 04/2/2026

Technical Director, SDK
Description: The Gearbox Entertainment Company is an award-winning creator and distributor of entertainment for people around the world. Gearbox Entertainment develops and publishes products through its subsidiaries, (more...)
Company: Gearbox Software
Location: Frisco
Posted on: 04/3/2026

Senior Online Programmer
Description: The Gearbox Entertainment Company is an award-winning creator and distributor of entertainment for people around the world. Gearbox Entertainment develops and publishes products through its subsidiaries, (more...)
Company: Gearbox Software
Location: Frisco
Posted on: 04/3/2026

Senior Java Full Stack Developer
Description: We are seeking a Senior Java Full Stack Developer to architect, build, and scale API-first platforms and integrations. This role requires deep expertise in Java and Spring Boot, combined with front-end (more...)
Company: Cognizant Technology Solutions
Location: Plano
Posted on: 04/2/2026

Senior Java Developer /SDET
Description: Cognizant s Engineering practice is seeking a Senior Java Developer / SDET who will help us to develop and implement comprehensive test strategies and plans for digital products. this position requires (more...)
Company: Cognizant Technology Solutions
Location: Plano
Posted on: 04/2/2026

Engineer - Digital Transformation
Description: Description Summary Responsibilities include implementation, maintenance, support, development of communications network and applications. Essential Duties and Responsibilities include the following. (more...)
Company: Sms Infocomm Corporation
Location: Edgewood
Posted on: 04/3/2026

Technical Lead Java Full Stack Engineer
Description: We are seeking a Technical Lead Java Full Stack Developer to architect, build, and scale API-first platforms and integrations. This role requires deep expertise in Java and Spring Boot, combined with (more...)
Company: Cognizant Technology Solutions
Location: Plano
Posted on: 04/2/2026

Loading more jobs...

Senior Site Reliability Engineer

Didn't find what you're looking for? Search again!

Other IT / Software / Systems Jobs

Log In or Create An Account