Job Overview:
We are seeking a dynamic, motivated and experienced Engineer for our Site Reliability Engineering Team to drive the traceability, and performance of LPL business-critical transactions across multiple systems.
In this role, you will design and implement strategies to monitor, trace, and debug critical transaction flows while ensuring system resilience.
You will work closely with cross-functional teams and vendor partners to establish robust observability frameworks and ensure seamless end-to-end transaction visibility.
This is an exciting opportunity to lead a critical function within our organization, driving meaningful change and enhancing the advisor experience.
If you are passionate about SRE and Observability and have a track record of success, we invite you to apply and be part of our journey toward greater resilience and efficiency.
Responsibilities:
End-to-End Observability: Design and implement observability frameworks for end-to-end transaction traceability across microservices, APIs, databases, and third-party integrations.
Leverage tools like Dynatrace, Open Telemetry, ELK, Grafana to trace transactions and visualize dependencies.
Build actionable dashboards and alerts to provide real-time insights into transaction health and performance.
Performance Optimization: Monitor transaction latency, throughput, and error rates to identify bottlenecks and optimize performance.
Use distributed tracing and telemetry data to analyze and resolve issues impacting transaction flows.
Work with application and database teams to fine-tune configurations for better transaction efficiency
Collaboration & Governance: Partner with application teams, architects, and business stakeholders to define transaction observability and resiliency requirements.
Develop and enforce standards for transaction monitoring and tracing across teams and environments.
Provide training and guidance to teams on implementing best practices for observability and resiliency
Critical Transaction Resiliency: Identify and prioritize business-critical transaction flows across distributed systems.
Develop strategies to ensure high availability and resilience for critical transactions.
Implement failover mechanisms, redundancy strategies, and fault-tolerant designs for transaction paths.
Collaborate with Site Reliability Engineering (SRE) and DevOps teams to conduct chaos engineering exercises to test resiliency.
Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical transaction paths.
Documentation & Reporting: Maintain comprehensive documentation of transaction flows, dependencies, and observability configurations.
Provide regular reports on transaction health, performance trends, and resiliency improvements to leadership.
Develop playbooks for handling transaction-related incidents and outages.
Achieve a 30% reduction in MTTD and MTTR within the first year of operation, demonstrating the effectiveness of the SRE capabilities, observability and self-healing
Able to identify offending service and root cause for at least 70% of incidents within 1 hour through effective usage of observability platforms
Create reusable templates and frameworks
At least 90% issues detected through monitoring systems
Foster a culture of continuous improvement, with regular training sessions and knowledge-sharing initiatives that empower team members.
Develop and maintain strong relationships with key stakeholders, to ensure alignment and support for the SRE Team’s objectives, enhancing overall organizational resilience.
What are we looking for?
We want strong collaborators who can deliver a world-class client experience.
We are looking for people who thrive in a fast-paced environment, are client-focused, team oriented, and are able to execute in a way that encourages creativity and continuous improvement.
Requirements:
5+ years in observability, SRE, or related roles with a focus on transaction monitoring and tracing
Hands-on experience with tools like Dynatrace, ELK, Datadog, Splunk, Open Telemetry, Jaeger, or equivalent
Expertise in monitoring critical transactions in cloud environments (AWS, Azure, or Google Cloud)
Strong understanding of microservices architecture, APIs, and distributed systems
Proficiency in scripting or programming languages (e.g., Python, Go, Java) for automation and integration.
Preferences:
Certifications: Dynatrace Associate or Professional Certification.
Experience with Open Telemetry and other observability standards.
Knowledge of chaos engineering practices and their integration with Dynatrace
Familiarity with AIOps platforms and automation solutions
#LI-Hybrid
Pay Range:
$92,288-$153,813/yearActual base salary varies based on factors, including but not limited to, relevant skill, prior experience, education, base salary of internal peers, demonstrated performance, and geographic location.Company Overview:
LPL Financial Holdings Inc.
(Nasdaq: LPLA) was founded on the principle that the firm should work for advisors and institutions, and not the other way around.
Today, LPL is a leader in the markets we serve, serving more than 23,000 financial advisors, including advisors at approximately 1,000 institutions and at approximately 580 registered investment advisor (RIA) firms nationwide.
We are steadfast in our commitment to the advisor-mediated model and the belief that Americans deserve access to personalized guidance from a financial professional.
At LPL, independence means that advisors and institution leaders have the freedom they deserve to choose the business model, services, and technology resources that allow them to run a thriving business.
They have the flexibility to do business their way.
And they have the freedom to manage their client relationships, because they know their clients best.
Simply put, we take care of our advisors and institutions, so they can take care of their clients.
Join LPL Financial: Where Your Potential Meets Opportunity
At LPL Financial, we believe that everyone deserves objective financial guidance.
As the nation’s leading independent broker-dealer, we offer an integrated platform of cutting-edge technology, brokerage, and investment advisor services.