The Company:
Marigold helps brands foster customer relationships through the science and art of connection. Marigold Relationship Marketing is a suite of world-class martech solutions that help marketers create long term customer love and loyalty. Marigold provides the most comprehensive set of use cases for marketers at any level. Headquartered in Nashville, Tennessee, Marigold has offices globally across the United States, Europe, Australia, New Zealand, South America and Central America, as well as in Japan.
What You’ll Do
-
Help build a Site Reliability Engineering culture by sharing your best practices, approaches, documentation, and code with other engineering teams
-
Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually
-
Troubleshoot complicated issues handling OS, Networking, Database in a cloud-based SaaS environment/on-premises environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
-
Monitor application performance, take steps to improve overall application performance and stability and follow through with implementation
-
Conduct system analysis, configuration management and develop improvements for system software performance, availability and reliability
-
Work closely with software and QA engineers to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
-
Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
-
Maintain and monitor deployments, orchestration, databases, and general backend infrastructure
-
Keep up-to-date with security and proactively identify, diagnose, and solve complex security issues.
-
Be part of an on-call rotation to support the global platform providing an excellent customer experience
Ideal Qualifications:
-
Degree in Computer Science or equivalent combination of education and experience
-
7+ yrs experience in DevOps or SRE role
-
7+ yrs Linux experience
-
5+ years managing production environments in AWS
-
5+ years experience in Kubernetes preferably EKS
-
3+ years creating and maintaining infrastructures with Terraform
-
Experience using infrastructure as code principles to design, build and maintain cloud platforms using Terraform/OpenToFu
-
Experience working with database and data store technologies such as RDS/MySQL, Elasticache/Redis or equivalent
-
Knowledge of core server-side concepts and experience working with cloud networking, load balancers, HTTP or GRPC protocols, and large scale microservice environments
-
Experience with observability stacks, instrumenting environments for logging and monitoring and building and designing dashboards and alerts
-
Knowledge of DevOps methodologies, basic programming and the tools involved in CI/CD automation
Nice to Have:
-
Experience managing high scale web application platforms or SaaS platforms
-
Strong Kubernetes, EKS or ECS/Fargate experience
-
Deep understanding of security principles
-
History of contributing to FOSS projects
-
Experience with AWS networking concepts such as VPC peering, Transit Gateway
-
Experience with multi-geography, multi-tenant applications
-
Experience designing and performing disaster recovery
-
Experience programming with Go or Python
-
Experience with cost management
-
Experience with NoSQL databases such as ScyllaDB.
-
Experience working with Stream processing and big data technology stacks such as Kafka or Trino
What We Offer:
-
The competitive salary and benefits you’d expect!
-
Generous time off (we call it Open Time Away) as well as paid holidays and a birthday benefit day off.
-
Retirement contributions.
-
Employee-centric and supportive remote work environment with flexibility.
-
Support for life events including paid parental leave.