Site Reliability Engineer
London
Job description
Who we are
Hello, we’re Zempler Bank, formerly Cashplus Bank. We’re here to make money simpler. We know that banking isn’t at the top of most people’s to do lists, that’s why making it less of a chore is at the top of ours. We don’t do banking the traditional way – the wrong way. We do banking that works for the people that need it, when they need it. We’re for the crafters, the grafters, and the self starters.
The bank for sole traders and those just starting up. And small businesses with growth on their minds and less time than they’d like. Just because you’re going it alone doesn’t mean you have to do it on your own. That’s where we step in – smart technology, straightforward human service, and bank accounts that meet real-life needs.
We’ve already helped over 650,000 businesses get off the ground. We’re here to do banking right and support our customers for the long haul.
Our business goals are
Become a bank that Customers Love
Develop Brilliant, Sustainable Products
Earn an Outstanding Reputation
The Role
Working as part of the Site Reliability Engineering team, a Senior SRE is a 3rd line role ensuring the performance, availability and support of all hardware and services. This includes working with Development, Info Sec and Service Delivery to ensure our platform is resilient, secure, scalable with increasing levels of automation, to deliver the needs of the Bank.
Team Hybrid Working Style
We are very proud to offer one of the most flexible hybrid working arrangements in the industry!
The expectation for this role, will involve a minimum of one day each month - working out of our London Bridge office.
Key Accountabilities Include
Site Reliability Engineering
Teamwork Documentation and Development – design, document and share specialist knowledge with other members of the team, including delivering training sessions when required as well as taking responsibility for all relevant documentation (updates, storage and roll out).
Service Continuity – take ownership of the architecture and design of the underlying Banking Platform to deliver agreed RTO, RPO and MAO targets, including supporting BCP activities and regular testing of DR and BCP.
Service Levels – understand and contribute to the Service Catalogue and architect to deliver agreed SLAs, including working with relevant third-party suppliers as required.
Security – ensure high levels of security by design, along with architecting a platform which supports monthly patching and vulnerability management to meet company approved information security policies and procedures, which includes PCI-DSS v4 and NIST-CSF
Lifecycle Support – support management of IT assets to ensure they are fully supported, including planning upgrades or replacements prior to end of life, to avoid increased risk or service interruption.
Financial – ensure software licences are used within agreed T&C’s and that physical assets, including cloud, are utilised efficiently to avoid waste.
Availability – achieve SLA’s by building and maintaining services with no Single Points of Failure, identifying weak or failing components for replacement before they cause incidents.
Capacity Support – configure and monitor infrastructure usage over time and with alerts to ensure we are always ‘one step ahead’ of demand.
Incident Support – configure and respond to monitoring alerts for issues with any devices, supporting incidents 24x7 (via paid team On-call rota) to recover service as quickly as possible and within agreed RTO and RPO, escalating when required.
Problem Resolution - support the PIR process and provide recommendations to avoid future incidents, including timely delivery of agreed solutions.
Configuration and Assets – maintain configuration repositories, including network diagrams, IT asset management system and agreed documentation.
Change Management – support the wider project and change programme, design and deliver agreed improvements following governance processes and industry best practices including documentation.
Releases - ensure all changes are released or made into controlled environments following agreed and repeatable processes, including roll-back to a known working state.
Reporting - provide agreed reporting and updates to the CTO and wider team, including accurate status of tickets being worked on.
Horizon Scanning and Strategy – keep abreast of relevant new technologies, security threats and regulatory changes to support the Site Reliability strategy.
Risk and Control Management
Risk mitigation through best practice and by following company procedures, including Change Control.
Identify risks and escalate to management, maintain Technology risk register, and support the wider Enterprise Risk Management framework agenda.
Personal Development Plan (PDP)
Ensure completion of company mandatory training.
Agree a PDP and objectives with line manager and track progress to agreed timescales.
Skills & Experience
Essential:
- Experience of defining and implementing IAC deployments to public cloud ideally Azure.
- In-depth knowledge of designing, deploying and maintaining containerised workloads. Ideally within public cloud.
- Experience with deploying and maintaining automation technologies like Octopus, Jenkins, Ansible, Terraform, Packer.
- Scripting experience for automation of tasks – PowerShell, Bash, etc.
- Expert knowledge with Linux and Windows Operating Systems
- Networking experience – including configuration and troubleshooting of firewall configurations, BGP, OSPF and related technologies.
- Managing Application Log Management tools like ELK
- Managing Monitoring tools like Zabbix
- Message queue technology, like RabbitMQ, IBM MQ
- Strong experience in Windows Server architecture design, configuration and troubleshooting techniques including Windows 2016/19/22, Active Directory, Group Policy, DNS, Certificate Services, IIS, DFS.
- Experience with Virtualisation in a Production environment - VMware, etc.
- Exposure to and ability to achieve and maintain PCI or similar security standards (e.g., PCI-DSS, NIST-CSF or ISO 27001).
- General technical skills: problem solving, network and security infrastructure, storage area networks, backups, firewalls, load balancers, virtualisation, monitoring, alerting, efficiency and optimisation, architecture design, documentation, procedural controls, identity and access management, automation, DevOps, CI/CD, 24x7 support.
Desirable:
- Experience managing and maintaining backup/replication services like VEEAM, Zerto or similar.
- Experience and familiarity with one or more of the following network appliances: F5, Arista, Juniper, Fortinet.
- Experience in CentOS/RHE, DDoS (e.g. Cloudflare), RDBMS (e.g. SQL Server), Rabbit MQ or IBM MQ.
- Professional certification in at least one of the following programmes: Microsoft, VMware, F5, Cisco.
- Graduate Calibre or equivalent, ideally CISSP, MCSE and ITIL qualified.
- Experience working in financial services, payment organisations, Banks, or an understanding of working in a regulatory environment where good governance is a requirement and a benefit.
- Membership of relevant professional body.
- Strong understanding of open data sources and supporting the delivery of APIs, e.g. for open banking.
In return you'll enjoy
- Competitive basic salary
- 7.5% of salary in cash allowing you the flexibility to decide your own benefits (or simply take the cash)
- 26 days’ holiday increasing each year of service to 33 days
- Ability to buy and sell a further 5 days holiday each year
- 4 x Life Assurance
- Pension salary sacrifice
- Family friendly policies
- Regular social activities and team events
- Charity Volunteering Day
- Free drinks and snacks in the office
Zempler Bank is an equal opportunity employer. Individuals seeking employment are considered without regard to race, religion, national origin, age, sex, gender, gender identity, gender expression, sexual orientation, marital status, medical condition, ancestry, physical or mental disability, military or veteran status, or any other characteristic protected by applicable law
- Job type
- Permanent
- Industry
- MIS / IT
- Posted
- 2024-11-08T00:00:00
Skills
- SRE