555 S. Flower Street, 18th Floor. Los Angeles, CA 90071
As an SRE, you will utilize your software, systems engineering, and operations background to build and run large-scale, fault-tolerant systems. Your role is to ensure the reliability, scalability and maximum uptime of the client' s Cloud Platform.
- Design, develop and implement solutions that improve stability, security, scalability and availability of the client' s software platforms.
- Design mechanisms for alerts and responses to identify and address reliability risks.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning, and reviews
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Design and run performance, capacity and monitoring tests.
- Create educational material such as cloud native sample apps and starter code, as well as contribute to holding cloud native educational events like hackathons and live coding sessions. Create educational documentation on how-to' s and best practices, and blog about use-cases and architectures that relate to cloud platforms
- Liaise with the team managing our public cloud environments, including setup, management, and troubleshooting
- *5+ years of experience in an Operational role, DevOps, SRE, or Software Engineering
- *5+ years of experience doing development in any of Java, NodeJS, .NET Core, Python
- *3+ years of experience with development or administration on any cloud platforms (Cloud Foundry, Heroku, AWS, Azure, Google Cloud, IBM Cloud, Bluemix, Kubernetes, and others). (The ideal candidate has significant experience with Platform as a Service cloud such as Cloud Foundry)
- *5+ years of experience developing applications with an active user base, and deploying to production and going through any change management process (Ideal candidate is able to engage in a detailed discussion about their change management process as well as its happy/pain points)
Skills and Knowledge
- Experience with Splunk / Elasticsearch and Kibana
- Experience with Monitoring tools such as Datadog, Dynatrace etc.
- Experience with automating manual processes and tests
- Creativity, energy, and passion for leveraging technology to transform our industry; the belief that automation is the only way
- A good understanding of modern, cloud centric architectures and DevOps principles
- Experience with the operational aspects of software systems such as monitoring, centralized logging, and alerting
- Providing standardized offerings to facilitate and ensure operational health of stacks throughout their lifecycle including metrics collection, aggregation, and visualization, inventory, capacity, and billing/tag management
- Above average performance. You are competitive and passionate. You thrive on challenge and have a proven ability to set ambitious but achievable goals and surpass them
*Represents basic qualifications for the position. To be considered for this position you must at least meet the basic qualifications.
Equal Opportunity/Affirmative Action Employer, Minorities/Females/Individuals with Disabilities/Veterans