Player FM - Internet Radio Done Right
Checked 2M ago
Aggiunto quattro anni fa
A tartalmat a Lee Atchison biztosítja. Az összes podcast-tartalmat, beleértve az epizódokat, grafikákat és podcast-leírásokat, közvetlenül a Lee Atchison vagy a podcast platform partnere tölti fel és biztosítja. Ha úgy gondolja, hogy valaki az Ön engedélye nélkül használja fel a szerzői joggal védett művét, kövesse az itt leírt folyamatot https://hu.player.fm/legal.
Player FM - Podcast alkalmazás
Lépjen offline állapotba az Player FM alkalmazással!
Lépjen offline állapotba az Player FM alkalmazással!
Modern Digital Applications with Lee Atchison
Mind megjelölése nem lejátszottként
Manage series 2937999
A tartalmat a Lee Atchison biztosítja. Az összes podcast-tartalmat, beleértve az epizódokat, grafikákat és podcast-leírásokat, közvetlenül a Lee Atchison vagy a podcast platform partnere tölti fel és biztosítja. Ha úgy gondolja, hogy valaki az Ön engedélye nélkül használja fel a szerzői joggal védett művét, kövesse az itt leírt folyamatot https://hu.player.fm/legal.
Welcome to Modern Digital Applications - a podcast for corporate decision makers and executives looking to create or extend their digital business with the help of modern applications, processes, and software strategy. Your host is Lee Atchison, a recognized industry thought leader in cloud computing and published author bringing over 30 years of experience.
…
continue reading
31 epizódok
Mind megjelölése nem lejátszottként
Manage series 2937999
A tartalmat a Lee Atchison biztosítja. Az összes podcast-tartalmat, beleértve az epizódokat, grafikákat és podcast-leírásokat, közvetlenül a Lee Atchison vagy a podcast platform partnere tölti fel és biztosítja. Ha úgy gondolja, hogy valaki az Ön engedélye nélkül használja fel a szerzői joggal védett művét, kövesse az itt leírt folyamatot https://hu.player.fm/legal.
Welcome to Modern Digital Applications - a podcast for corporate decision makers and executives looking to create or extend their digital business with the help of modern applications, processes, and software strategy. Your host is Lee Atchison, a recognized industry thought leader in cloud computing and published author bringing over 30 years of experience.
…
continue reading
31 epizódok
Minden epizód
×Modern Digital Applications is changing and coming back after its year long hiatus. Join us for the launch of Modern Digital Business! Modern Digital Business will be coming later this summer. If you'd like to be informed when it's ready to launch, please go to mdb.fm/launch . We hope to see you there!…
My guest today is Kevin Goslar. Kevin is the Senior Vice President for Technology Strategy at Originate, a digital agency that helps organizations with digital transformation best practices. He has a PhD in business informatics, and is an avid software developer. He currently is the maintainer for Git Town, an open-source project that provides a high-level CLI for Git. Previously, Kevin worked as a software developer at Google, which is where he was exposed to Mono-repos. Kevin is a Git expert and process advocate, and he’s here to discuss with me the pros and cons of monorepos vs polyrepos. This is part 2 of my interview with Kevin.…
My guest today is Kevin Goslar. Kevin is the Senior Vice President for Technology Strategy at Originate, a digital agency that helps organizations with digital transformation best practices. He has a PhD in business informatics, and is an avid software developer. He currently is the maintainer for Git Town, an open-source project that provides a high-level CLI for Git. Previously, Kevin worked as a software developer at Google, which is where he was exposed to Mono-repos. Kevin is a Git expert and process advocate, and he’s here to discuss with me the pros and cons of monorepos vs polyrepos.…
My guest today is Beth Long. Beth worked at New Relic where she held roles in both engineering and marketing, including two years leading the Reliability Engineering team, which owned the tooling and process for incident response and analysis. She also led New Relic's collaboration with the SNAFU Catchers, a group of researchers in- vestigating how tech companies learn from incidents. Beth recently left New Relic to join the startup Jeli.io, where she leads the engi- neering team working on the industry's first incident analysis platform. Links Beth Long, Engineering Manager at Jeli.io LinkedIn: https://www.linkedin.com/in/beth-adele-long/ Twitter: https://twitter.com/BethAdeleLong Featured in this episode: Jeli.io (https://jeli.io) Learning From Incidents with Jeli (https://leeatchison.com/atscale/2020/12/07/learning-from-incidents-with-jeli/) S3 Outage Mentioned in this Episode (https://thenewstack.io/dont-write-off-aws-s3-outage-fat-finger-folly/)…
My guest today is Beth Long. Beth worked at New Relic where she held roles in both engineering and marketing, including two years leading the Reliability Engineering team, which owned the tooling and process for incident response and analysis. She also led New Relic's collaboration with the SNAFU Catchers, a group of researchers in- vestigating how tech companies learn from incidents. Beth recently left New Relic to join the startup Jeli.io, where she leads the engi- neering team working on the industry's first incident analysis platform. Links Beth Long, Engineering Manager at Jeli.io LinkedIn: https://www.linkedin.com/in/beth-adele-long/ Twitter: https://twitter.com/BethAdeleLong Featured in this episode: Jeli.io (https://jeli.io) Learning From Incidents with Jeli (https://leeatchison.com/atscale/2020/12/07/learning-from-incidents-with-jeli/) S3 Outage Mentioned in this Episode (https://thenewstack.io/dont-write-off-aws-s3-outage-fat-finger-folly/)…
M
Modern Digital Applications with Lee Atchison
![Modern Digital Applications with Lee Atchison podcast artwork](/static/images/64pixel.png)
The scheduling of a cloud migration is a complex undertaking that should be thought and planned in advance. But in order for a migration to be successful, it’s important that you limit your risk as much as possible during the migration itself, so that unforeseen problems don’t show up and cause your migration to go sideways, fail outright, or result in unexpected outages that negatively impact your business.…
M
Modern Digital Applications with Lee Atchison
![Modern Digital Applications with Lee Atchison podcast artwork](/static/images/64pixel.png)
Moving your data is one of the trickiest parts of a cloud migration. During the migration, the location of your data can have a significant impact on the performance of your application. During the data transfer, keeping the data intact, in sync, and self-consistent requires either tight correlation or—worse—application downtime. Moving your data and the applications that utilize the data at the same time is necessary to keep your application performance acceptable. Deciding how and when to migrate your data relative to your services, though, is a complex question. Often companies will rely on the expertise of a migration architect, which is a role that can greatly contribute to the success of any cloud migration. Whether you have an on-staff cloud architect or not, there are three primary strategies for migrating application data to the cloud.…
M
Modern Digital Applications with Lee Atchison
![Modern Digital Applications with Lee Atchison podcast artwork](/static/images/64pixel.png)
My guest today is Thomas Curran. Thomas is a cloud executive with many years of experience, including VP of Technology and Innovation at Deutsche Telekom and Technology Advisor at Deutsche Börse. He is the co-founder of the Ory Software Foundation, which is the owner of a very popular open source, go-based, identity management library named Kratos, along with other open source identity management tools. Now, Thomas is co-founders of Ory Corp, an Open Source Identity Infrastructure and Services company. Thomas is with me today from his office in Munich, Germany, to talk about application identity management. As means of full disclosure, I’ve worked with Thomas personally for many years, first meeting him back when he was at Deutsche Börse. I’m now currently working directly with Thomas at Ory, architecting their new cloud infrastructure. Links and More Information * Thomas LinkedIn (https://www.linkedin.com/in/thomasaidancurran/) * Ory (https://ory.sh) Tech Tapas — History of the Term SaaS When did software as a service start? Well, that depends on what you mean by the term… depending on how you define SaaS, the answer is either the early 1960s, or somewhere around 2005. Back in the early days of computing, all applications ran on a centralized computer. Users accessed the computers remotely. Initially via punch cards and later via remote terminals. The centralized nature of the application is, by a true definition, Software as a Service. But the modern definition of SaaS is tied much more closely with cloud computing. SaaS now-a-days refers to software running centrally, typically in a public or private cloud environment, and is shared among multiple users. A thin client of some sort — either a web browser or a thin mobile application — is used to front the centralized application. From a business model standpoint, users don’t buy SaaS software, instead they rent or lease access to it with monthly or annual fees. Alternatively, the service could be free and supported by advertising or other monetization processes. This is the heart of the business model for social media, for example. So, SaaS is an old term that has been given new meaning in recent years. But it’s the recent definition that has really changed the way people think and build software today. Tech Tapas — Amazon S3 Amazon S3. A highly durable, highly available file and object storage mechanism in the cloud. This service is the go to service for most companies that want to store huge quantities of data in the cloud, or for long term persistent object storage. S3 was designed with the goals of being highly available, highly durable, and highly scalable. The design goal for availability is 99.99%, with a durability of objects of 99.999999999 (that’s 11 9’s). How available? The 4 9’s availability translates to a total of 52 minutes of downtime per year. How durable? The 11 9’s durability means that if every man, woman, and child in the world had an object in S3, then Amazon would lose at most one of those objects, approximately once every 15 years. These are amazing goals, and is one of the reasons S3 has such a great reputation as a high quality object storage system. S3 was one of three initial AWS services and was a big part of AWS’s early success.…
M
Modern Digital Applications with Lee Atchison
![Modern Digital Applications with Lee Atchison podcast artwork](/static/images/64pixel.png)
My guest today is Thomas Curran. Thomas is a cloud executive with many years of experience, including VP of Technology and Innovation at Deutsche Telekom and Technology Advisor at Deutsche Börse. He is the co-founder of the Ory Software Foundation, which is the owner of a very popular open source, go-based, identity management library named Kratos, along with other open source identity management tools. Now, Thomas is co-founders of Ory Corp, an Open Source Identity Infrastructure and Services company. Thomas is with me today from his office in Munich, Germany, to talk about application identity management. As means of full disclosure, I’ve worked with Thomas personally for many years, first meeting him back when he was at Deutsche Börse. I’m now currently working directly with Thomas at Ory, architecting their new cloud infrastructure. Links and More Information * Thomas LinkedIn (https://www.linkedin.com/in/thomasaidancurran/) * Ory (https://ory.sh)…
The scheduling of a cloud migration is a complex undertaking that should be thought and planned in advance. Typically, a migration architect is involved and makes the difficult technical decisions of what to migrate when, in concert with the organization management to take into account the business needs. But it’s important for a migration to be successful that you limit your risk as much as possible during the migration, so that unforeseen problems don’t show up and cause your migration to go sideways, fail, or result in unexpected outages that negatively impact your business. When scheduling the migration, there are a number of things you should keep in mind to increase the likelihood of a successful migration and reduce the risk of the migration itself. Here are five key methods to reducing the risk of your cloud migration, and hence increase your overall chance for success. Links and More Information The following are links mentioned in this episode, and links to related information: • Modern Digital Applications Website (https://mdacast.com) • Lee Atchison Articles and Presentations (https://leeatchison.com) • Architecting for Scale, published by O’Reilly Media (https://architectingforscale.com) • Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com) • Course: Building a Cloud Roadmap, 2018-2019 (https://leeatchison.com/classes/building-a-cloud-roadmap/) Key #1. Limit the complexity of migrating your data The process of migrating your data from your on-premise datastores to the cloud is, itself, the hardest, most dangerous, and most time-consuming part of your migration. There are many ways to migrate your data…some of the methods are quite complex and some of them are very basic. Some of them result in no need for downtime, others require significant downtime in order to implement. There is a tradeoff you need to make between the complexity of the migration process and the impact that complexity has on the migration, including the potential need for site downtime. While in some scenarios you must implement a complex data migration scheme to reduce or eliminate downtime and reduce risk along the way, in general I recommend choosing as simple of a data migration scheme as possible given your system constraints and business constraints. The more complex your data migration strategy, the riskier your migration. By keeping the data migration process as simple as practical given your business constraints, you reduce the overall risk of failure in your migration. Be aware, though, that you may require a certain level of migration complexity in order to maintain data redundancy and data availability during the migration itself. So the ultimate simplest migration process may not be available to you. Still, it’s important that you select the simplest migration process that achieves your business and technical migration goals. Key #2. Reduce the duration of the in-progress migration as much as possible. Put another way, do as much preparation work before you migrate as you can, and then once you start the migration, move as quickly as possible to completing the migration, postponing as much work as possible until after the migration is complete and validated. By doing as much preparation work before the migration as possible and pushing as much cleanup work to after the migration as possible, you reduce the amount of time and complexity of the migration itself. Given that your application is most at risk of a migration related failure during the migration process itself, reducing this in-migration time is critical to reducing your overall risk. For example, it may be possible to accept a bit lower overall application performance in the short term—during the migration, in order to get to the end of your migration quicker. Then, after the migration is complete, you can do some performance refactorings to improve your overall performance situation. While postponing an important performance improvement is not normally ideal as it increases your technical debt. In this case, the delay may be more beneficial, because it allows the migration itself to complete quicker which reduces your overall migration related risk. Remember, your application is most at risk for something going wrong after the migration starts until the time when the entire application is moved to the cloud and the migration is complete. By postponing this performance improvement work until post-migration, you are able to keep the time of migration to as short a period as possible, hence reducing your risk. Key #3. Leave yourself as many options to back out a migration step as you can. The more options you have to revert a step the less risky your migration will be as a whole. As you take each step of the migration, think before you execute the step. Ask yourself how you could back out of the step if something happened that requires you to backoff. Every step you take that can be reversed if a problem occurs, means your risk of executing the step is reduced. If you take a step that cannot be reversed, you are operating under a higher risk. Key #4. Be conscious of interim performance issues and the impact on system availability. If you are doing your cloud migration a service at a time…module at a time…or application at a time, there will be long periods when some services, modules, and applications are in the cloud…and some are still on prem. Be very careful during these interim times. Latency of communications between components that are in the cloud and components that are still on-prem, will be significantly higher than if both components are either in the cloud, or both are on-prem. This means that, during the migration when some components have migrated and some have not, your overall system latency will be different…potentially significantly worse…than it was before the migration or after the migration. This latency may impact the usability of your system, and it may impact the availability of your system, since a significant change in latency between some components could cause unseen defects to become exposed and cause problems. During the migration, as some components have been moved, your application will act differently…and slower…than it will any other time. Plan for these differences and use these differences in your determination of which modules to move when. Key #5. Do as much refactoring before you migrate as you possibly can. While this is a corollary of many of the previous keys, it is important to stress here as well. The more work you can do to prepare your application for the cloud before you move it, the more likely you will succeed in making the migration successful and the quicker the migration will take overall. … So, in summary, the five keys to reducing your cloud migration risk are: • Limit the complexity of data migration. • Reduce the duration of all in-progress migrations. • Leave yourself backout options. • Be conscious of interim performance issues. • Do as much refactoring before you begin the migration. Keeping track of these five keys to risk reduction while you create your migration plan will help reduce your overall migration time and…most importantly…reduce the risk of a migration related failure. Tech Tapas – Services vs Microservices There is some controversy in the industry about use of the terms service and microservice. I personally do not like the term “microservice”. Why is that? It’s because the term “microservice” implies a specific sizing of a service that is not necessarily a healthy assumption. Yes, many services are small, some are truly “micro”, but many are much larger too. The appropriate size for a service is based on context and is subject to many concerns and criteria. In my mind the use of the term “microservices” biases this discussion. However, I recognize that the term microservice has gained strong popularity in the industry. There are also people that pigeon hole the use of the term “service” as part of the broader and older category of “SOA” or “Service Oriented Architectures”. They further pigeon hole the term to refer to a particular type of architecture offering that was popular a decade or more ago. I find these comparisons inaccurate and confusing, and I do not believe the reference is reasonable. This makes “service” sound old an ancient, which is not at all true. As far as which term to use… My personal preference is to use the term “service”, but I recognize many people use the term “microservice”. So, I tend to use both terms in my discussions with other companies, depending on context. In my mind, both the term service and microservice mean the exact same thing. And as such, I use the two terms interchangeably. When you hear me talk about services, microservices, service architectures, etc. I am referring to the exact same thing.…
Amazon Web Services provides a cloud certification program to encourage and enable growing your AWS cloud technical skills to help you grow your career and your business. Have you wondered what it takes to become AWS certified? In this episode, I conclude my interview with Kevin Downs, a trial by fire expert on the AWS certification program, as we discuss the AWS cloud certification program, and how to best utilize it. And then, what was the first AWS service? This is AWS Certifications, on Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website ( https://mdacast.com ) Lee Atchison Articles and Presentations ( https://leeatchison.com ) Architecting for Scale, published by O’Reilly Media ( https://architectingforscale.com ) Advising and Consulting Services by Lee Atchison ( https://atchisontechnology.com ) AWS Certifications (https://aws.amazon.com/certification/) A Cloud Guru (https://acloudguru.com) Kevin Downs Twitter (https://twitter.com/kupsand) Kevin Downs LinkedIn ( https://www.linkedin.com/in/kevin-downs/ ) This episode is part 2 and final part of my interview with Kevin Downs.…
Amazon Web Services provides a cloud certification program to encourage and enable growing your AWS cloud technical skills to help you grow your career and your business. Have you wondered what it takes to become AWS certified? In this episode, join me with Kevin Downs, a trial by fire expert on the AWS certification program, while we discuss the AWS cloud certification program, and how to best utilize it. And then, what was EC2 like in the old days? Back before it was actually useful? This is AWS Certifications, on Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website ( https://mdacast.com ) Lee Atchison Articles and Presentations ( https://leeatchison.com ) Architecting for Scale, published by O’Reilly Media ( https://architectingforscale.com ) Advising and Consulting Services by Lee Atchison ( https://atchisontechnology.com ) AWS Certifications (https://aws.amazon.com/certification/) A Cloud Guru (https://acloudguru.com) Kevin Downs Twitter (https://twitter.com/kupsand) Kevin Downs LinkedIn ( https://www.linkedin.com/in/kevin-downs/ ) This episode is part 1 of 2 of my interview with Kevin Downs.…
M
Modern Digital Applications with Lee Atchison
![Modern Digital Applications with Lee Atchison podcast artwork](/static/images/64pixel.png)
Likelihood and Severity. Two different measures for two different aspects of measuring risk in a modern digital application. They are both measures of risk, but they measure different things. What is the difference between likelihood and severity? And why does it matter? In this episode, I’ll discuss Likelihood and Severity, how they are different, and how they are both useful measures of risk in a modern digital application. Links and More Information The following are links mentioned in this episode, and links to related information: • Modern Digital Applications Website ( https://mdacast.com ) • Lee Atchison Articles and Presentations ( https://leeatchison.com ) • Architecting for Scale, published by O’Reilly Media ( https://architectingforscale.com ) • Advising and Consulting Services by Lee Atchison (https://atchisontechnology.com) • Learning Path - Risk Management ( http://leeatchison.com/classes/learning-path-risk-management/ ) • O’Reilly Learning Path Course ( https://learning.oreilly.com/learning-paths/learning-path-microservices/9781492061106/ ) Microservice architectures offer IT organizations many benefits and advantages over traditional monolithic applications. This is especially true in cloud environments where resource optimization works hand-in-hand with microservice architectures. So it’s no mystery that so many organizations are transitioning their application development strategies to a microservices mindset. But even in the realm of microservices, building and operating an application at scale can be daunting. Problems can include something as fundamental as having too few resources and time to continue developing and operating your application, to underestimating the needs of your rapidly growing customer base. At its best, failure to build for scale can be frustrating. At its worst, it can cause entire projects—even whole companies—to fail. Realistically, we know that it’s impossible to remove all risk from an application. There is no magic eight ball — no crystal ball — that allows you to see in the future and understand how decisions you make today impact your application tomorrow. Risk will always be a burden to you and your application. But, we can learn to mitigate risk. We can learn to minimize and lessen the impact of risk before problems associated with the risk negatively impact you and your applications. I’ve worked in many organizations, and have observed many more. Planning for problems is very hard and something most organizations fail to do properly. Technical debt is often a nebulous concept. Quantifying risk is the first step to understanding vulnerability. It also helps set priorities and goals. Is fixing one potential risk more important than another? How can you decide if the risks aren’t understood and quantified. In this episode, we’re going to talk about how to measure risk, so that you can build, maintain, and operate large, complex, modern applications at scale. There is a great quote by Donald Rumsfeld, twice former secretary of defense for the United States. It starts “Reports that say that something hasn’t happened are always interesting to me”. He goes on to say: “because, as we know, there are known knowns, there’re things we know we know. We also know there are known unknowns, that is to say we know there are some things we do not know.” “But there are also unknown unknowns. The ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.” This is true in running a country, and a country’s military, and it is true in running a modern digital application at scale. This quote encompasses the entire meaning of risk management in one single quote. Risk management is about dealing with the unknown unknowns. You will often hear me talk about my big game example. This is the example where you invite 20 of your closest friends over to your house to watch the big game on your brand new big screen TV. Only, once the party — and the game — start, the power goes off in your home. Your big day is over, and your friends go home disappointed. Now, what would you do if you called the power company to report this outage, and their response was “What are you complaining about, you have power most of the time. In fact, we see you have power 95% of the time. Who cares if power goes out the other 5% of the time?” Who cares indeed. The reality is the power company can’t operate in this way. They cannot be satisfied with “good enough” service. They have to strive to provide power to you 100% of the time. 24 hours a day. 7 days a week. They have to strive for perfection. This difference from perfection. This extra 5%, it’s driven by the expected actions and problems that we see. It’s driven by the unknowns. It’s driven by the things we don’t even know that we don’t know. It’s driven by the unknown unknowns. Preparing for these unknown unknowns, is what risk management is all about. Risk, like anything else, can be quantified. There are two fundamental metrics that matter most when quantifying risk. Likelihood…and severity. Likelihood is the measure of the chance of a particular risk triggering . Or, put another way, it’s the measure of the chance of a particular risk occurring . We say “what’s the likelihood of our pipes freezing tonight?” Or “what’s the likelihood of us getting rain tomorrow" Or “what’s the likelihood of a tornado hitting our house?” Likelihood measures the possibility of an event happening. The likelihood of you getting rain tomorrow, for instance, is most definitely significantly higher than the likelihood of your house getting hit by a tornado tomorrow. That’s likelihood. Severity is the measure of the cost of a risk that triggers. If a risk occurs, what really happens and how severe are the ramifications? Using the examples above, the severity of rain hitting you on the head is pretty low. The severity of your pipes freezing is greater. But the severity of a tornado hitting your house — well, that is severe — the impact of each of those three things happening is different, and that difference is measured by severity. It’s important to keep these two things distinct and understand the difference between them. Likelihood is IF an event will occur. Severity is WHAT is the cost of the event occurring. The chance of rain tomorrow might be high (likelihood) but it doesn’t hurt you too much if it does (severity). The chance of your house getting hit by a tornado is very low (likelihood) but the impact of that event would be catastrophic for you (severity). These two measures work together to quantify the risk of a particular event. These two measures together are what we use to track and measure risk…whether that risk is a weather related risk, or a risk of an application failure in your business systems. In my book, architecting for scale, I give an example of risk measurement in a modern application by utilizing a T-Shirt e-commerce store example. We can measure the risk of a failure of components of this application using likelihood and severity. For example… The e-commerce store probably has a top ten list component — a service that generates a top ten list of products sold through the site. What’s the risk of the top ten list not appearing? The likelihood of the list not appearing is probably relatively low — it’s a simple component without a lot of complexity to it. Likewise, the severity of the problem of the top ten list not appearing is also low. If customer’s can’t see the top ten list, it doesn’t significantly impact their buying experience. This would be a low likelihood, low severity problem. In shorthand, it’d be a low/low risk. But what about the order database? What’s the risk if the order database stops accepting new orders? Well, once again, the likelihood of that happening is probably relatively low. It’s an important subsystem that we’ll assume is probably well maintained. But if it does happen, the severity of that problem is quite high. If you can’t accept orders, your entire business suffers. This would be a low likelihood, high severity problem. Shorthand, a low/high risk. Moving on, let’s say your store uses a custom font to make the display more visually pleasing. What’s the risk of the font not loading in a user’s browser? Well, the likelihood of this happening might in fact be quite high. You can imagine scenarios where a user’s browser has a poor internet connection and the font file doesn’t load correctly. Or maybe you are using a 3rd party font service to provide the fonts dynamically. The likelihood of this problem occurring could actually be quite high. But what about the severity? Here the severity is probably quite low. If the custom font doesn’t load, the browser will just substitute a different font for the page. The page will still work, it just might not look quite as visually appealing as you desire it to be. This would be a high likelihood, low severity problem. Shorthand, a high/low risk. Finally, let’s take a look at the t-shirt photos that appear in the store. These are the pictures of products that customer’s might buy. What’s the risk of the photos not appearing on a page? Well, the likelihood of this risk could be high, because showing photos on a page means loading them from a cache server or maybe a 3rd party CDN, and this system might not be working quite right. The photos may not be available, or the user’s internet connection could flake out and not show them. The likelihood of this problem occurring is, in this example, high. What about severity? Well, it’s hard to imagine that a customer would buy a t-shirt that they could not see a photograph of, so if the photos aren’t appearing, that could have a big impact on your business since people would buy fewer t-shirts. The severity of this problem is also high. This would be a high likelihood, high severity problem. In shorthand, this would be a high/high risk. These are four examples of problems that might occur in an e-commerce store, and the risk measurement associated with them happening. Now that we can measure the risk, we can use that measurement to prioritize work to mitigate or remove those risks. We can imagine that mitigating or removing a high/high risk, would be more critical than a high/low or low/high risk, and all of them would be more important than working on a low/low risk. We can properly determine which risks are most helpful for us to work on, and we can measure the impact of our work to mitigate those risks. In future episodes, I will continue the topic of risk management and discuss tools and techniques for monitoring, reporting, and mitigating risk in our applications with the ultimate goal of reducing the impact that risk has on our availability of our applications.…
Building a scalable application that has high availability is not easy. Problems can crop up in unexpected ways that can cause your application to stop working and stop serving your customer’s needs. No one can anticipate where problems will come from and no amount of testing will identify and correct all issues. Some issues end up being systemic problems that require the correlation of multiple systems in order for the problems to occur. Some are more basic, but are simply missed or not anticipated. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website ( https://mdacast.com ) Lee Atchison Articles and Presentations ( https://leeatchison.com ) Architecting for Scale, published by O’Reilly Media ( https://architectingforscale.com ) Application availability is critical to all modern digital applications. But how do you avoid availability problems? You can do so by avoiding those traps that cause poor availability. There are five main causes of poor availability that impact modern digital applications. Poor Availability Cause Number 1 Often, the main driver of application failure is success. The more successful your company is, the more traffic your application will receive. The more traffic it receives, the more likely you will run out of some vital resource that your application requires. Typically, resource exhaustion doesn’t happen all at once. Running low on a critical resource can cause your application to begin to slow down, backlogging requests. Backlogged requests generate more traffic, and ultimately a domino effect drives your application to fail. But even if it doesn’t fail completely, it can slow down enough that your customers leave. Shopping carts are abandoned, purchases are left uncompleted. Potential customers go elsewhere to find what they are looking for. Increasing the number of users using your system or increase the amount of data these consumers are using in your system, and your application may fall victim to resource exhaustion. Resource exhaustion can result in a slower and unresponsive application. Poor Availability Cause Number 2 When traffic increases, sometimes assumptions you’ve made in your code on how your application can scale are proven to be incorrect. You need to make adjustments and optimizations on the fly in order to resolve or work around your assumptions in order to keep your system performant. You need to change your assumptions on what is critical and what is not. The realization that you need to make these changes usually comes at an inopportune time. They come when your application is experiencing high traffic and the shortcomings start becoming exposed. This means you need a quick fix to keep things operating. Quick fixes can be dangerous. You don’t have time to architect, design, prioritize, and schedule the work. You can’t think through to make sure this change is the right long term change You need to make changes now to keep your application afloat. These changes, implemented quickly and at the last minute with little or no forethought or planning, are a common cause of problems. Untested and limited tested fixes, quickly thought through fixes, bad deployments caused my skipping important steps. All of these things can introduce defects into your production environment. The fact that you need to make changes to maintain availability, will itself threaten your availability. Poor Availability Cause Number 3 When an application becomes popular, your business needs usually demand that your application expand and add additional features and capabilities. Success drives larger and more complex needs. These increased needs make your application more complicated and requires more developers to manage all of the moving parts. Whether these additional developers are working on new features, updated features, bug fixes or other general maintenance, the more individuals that are working on the application, the more moving parts that exist, the greater the chance of a problem occurring that brings your application down. The more your application is enhanced, the more likely there is for an availability problem to occur. Poor Availability Cause Number 4 Highly successful applications usually aren’t islands unto themselves. Highly successful applications often interact with other applications, either applications that are part of your application suite, or third party applications. Third party applications can be provided by vendors or partners. They can be external SaaS services. Or, they can be integrations with customer systems. The more dependencies you have, the more exposed you are to problems introduced by those other external systems. Your availability will ultimately become tied to the availability and quality of those external applications. The more dependencies you have, the more fragile your application becomes. Poor Availability Cause Number 5 As your application grows in complexity, the amount of technical debt your application has naturally increases. Technical debt is the accumulation of desired software changes and pending bug fixes that typically build up over time as an application grows and matures. Technical debt, as it builds up, increases the likelihood of a problem occurring. The more technical debt you have, the greater the likelihood of an availability problem. Conclusion All fast-growing applications have one or more of these problems. These problems are the sort of problems that increase the risk of having a problem with availability. Potential availability problems can begin occurring in applications that previously performed flawlessly. The problems can quietly creep up on you, or the problems may start suddenly without warning. But most applications, growing or not, will eventually have availability problems. Availability problems cost you money, they cost your customer’s money, and they cost you your customer’s trust and loyalty. Your company cannot survive for long if you constantly have availability problems. Focusing on these five causes will go a long ways to improving the availability of your applications and systems. Tech Tapas — Database backup test failure I want to tell you a story. You tell me if this is ok or not. This was from a conversation I had heard in a company I was working with. The conversation was a message from one engineer to their peers, They were trying to update them on the situation of a production database. The message went like this: “We were wondering how changing a setting on our MySQL database might impact our performance…” “…but we were worried that the change might cause our production database to fail.” “Since we didn’t want to bring down production, we decided to make the change to the replica database instead…the backup database…” “After all, it wasn’t being used for anything at the moment.” Of course, you can imagine what happened next, and you would be right. The production database had a hardware failure, and the system automatically tried to switch over to use the replica database. But the replica database was in an inconsistent state due to the experimentation that was going on with it. As such, the replica database was not able to take on the job as the new master…it quickly became overwhelmed…and then it failed as well. Both the original master, and the replica failed. The replica, who’s sole purpose for existence was to take over in case the master failed, wasn’t able to do so because it was being tinkered on by other engineers. Those other engineers didn’t understand that, just because the replica wasn’t actively servicing production traffic, that doesn’t mean it wasn’t being used. It’s entire job was to sit in wait to take over if necessary. By experimenting on that replica database, they were inadvertently impacting production. They were introducing risk into the production system — risk that wasn’t appropriate. Risk that could — and in this case did — cause serious problems. This, by the way, was a true story. But it also is not an uncommon story. I hear similar sorts of problems occur in many engineering conversations, and many operations management conversations. Not having a clear understanding or appreciation for how certain actions impact the risk management plans for a production system can be disastrous. This is why active and continuous risk management planning is critical for production networks to stay operational.…
M
Modern Digital Applications with Lee Atchison
![Modern Digital Applications with Lee Atchison podcast artwork](/static/images/64pixel.png)
We often hear that being able to scale your application is important. But why is it important? Why do we need to be able to suddenly, and without notice, scale our application to handle double, triple, or even ten times the load it is currently experiencing? Why is scaling important? In this episode, I am going to talk about four basic reasons. Four reasons why scaling is important to the success of your business. And then, what is the dynamic cloud? This is Application Scaling, on Modern Digital Applications. Links and More Information The following are links mentioned in this episode, and links to related information: Modern Digital Applications Website ( https://mdacast.com ) Lee Atchison Articles and Presentations ( https://leeatchison.com ) Architecting for Scale, published by O’Reilly Media ( https://architectingforscale.com ) Why you must scale We often hear that being able to scale your application is important. But why is it important? Why do we need to be able to suddenly, and without notice, scale our application to handle double, triple, or even ten times the load it is currently experiencing? Why is scaling important? There are many reasons why our applications must scale. A growing business need is certainly one important reason. But there are other reasons why architecting your application so it can scale is important for your business. I am going to talk about four basic reasons. Four reasons why scaling is important to the success of your business. Reason #1. Support your growing business This is the first, and the most basic reason why your application has to scale. As your business grows, your application needs grow. But there is more to it than that. There are three aspects of a growing business that impact your application and require it to scale. First, is the most obvious. As you get more customers, your customer’s make more use of your applications and they need more access to your website. This requires more capacity and more growth for the IT infrastructure for your sites. But that’s not the only aspect. As your application itself grows and matures, typically you will add more and more features and capabilities to the application. Each new feature and each new capability means customers will make more use of your application. As each customer uses more of your application, the application itself has to scale. Simply by your business maturing over time, even if the size of your customer base doesn’t grow, the computation needs for your application grow and your application must scale. And finally, as your business grows and matures, and your application grows and matures, your more complex application will require more engineers to work on the application simultaneously, and they will work on more complex components. Your application might be rearchitected to be service based. It might add additional external dependencies and provisions. You will have to support more deployments and more updates. Your application and your application infrastructure will need to scale to support larger development teams and larger projects. This means you need more mature processes and procedures to scale the speed at which your larger team can improve your application. Reason #2. Handle surprise situations The second reason you need to be able to scale your application is to handle surprise situations and conditions. All businesses have their biggest days. These are the days where traffic is at the heaviest. These are days like Black Friday in retail, or the day of the Super Bowl for companies that advertise during that event, or open enrollment periods, or start of travel season. But your business may have unexpected business bumps. These are the traffic increases that occur not because of a known big event, but because of an unknown or unexpected event. When an event occurs that is favorable to your business, you need to be ready to handle the increased load that occurs to support the event. If you cannot handle the increased load, you risk losing the new business, and you risk disappointing your existing customers. Sudden business success can kill you if you can’t scale to meet the need. Just ask Robinhood Financial. Robinhood Financial is an investment company that provides investment management services. On Monday, March 2nd, 2020, Robinhood faced a business crisis. They faced a sudden increase in business. On that day, the United States stock market had a record-breaking day. This record-breaking day resulted in a record number of account signups and customer market transactions. This is good news for a company such as Robinhood. The problem was that their traffic load was not only high, it was too high. They needed to be able to respond to a huge spike in traffic to their application. Unfortunately, they were unable to keep up with the sudden demand. The result was a failure of their systems…and their application. The Robinhood Financial site was down…for a day and a half. This was during a peak stock market time, a time when their customer’s needed them the most. As a result, they lost out on a huge amount of easy, new business; and they created hardship and disappointment for many of their existing customers. Potential new customers and existing customers alike, were disappointed. A potential opportunity for huge growth and huge upside for the company…instead turned into a major negative event for the company. An event their founders had to publicly apologize for. All because they couldn’t scale to handle the surprise traffic load. To be successful, companies must be able to scale to meet sudden and unexpected traffic demands. Reason #3. Handle a partial outage The third reason is a sneaky one. You need to be able scale in order to handle partial application outages. Partial outages can be a big problem for businesses. You have a large application, distributed across the globe in multiple data centers.or availability zones, if you are operating in the cloud. You spread your application out like this for improved redundancy, availability, accessibility, and resiliency. But now, one of your data centers goes down. Of course, since you are operating in more data centers, a single data center outage is far more likely. This means more chances for something to go wrong in any one of them. But when a data center goes down, the traffic that would normally be sent to that data center has to be re-routed to other data centers. This results in a big uptick in traffic to those other data centers. Can those data centers handle the increased traffic? If not, those data centers could go down as well. The result is your application can fail and become unavailable, due to excessive traffic. This seems counter intuitive, but your plan to increase availability, just made your application less available. Your plan for improved redundancy by increasing the number of data centers, actually made your application more fragile. By increasing the number of data centers you were using, you increased the risk of a data center failure. And your application isn’t able to scale to handle the increased traffic needs of a data center failure. The result is an application melt down. A step to improve availability makes availability worse. Can your other data centers accept the sudden challenge of handling the additional traffic that is sent to them from a failed data center? Can you respond to this sudden need for scale? You must, or your application is at risk. Reason #4. Maintain availability The fourth reason is to maintain availability. As your application gets more complex, it requires more interactions between many different components to work correctly. If one of those components begins to act sluggishly, it can cause performance issues in downstream services. These downstream performance issues can become worse, and more critical problems can occur, such as transaction timeouts, corruption, data loss, and ultimately, upset customers. A single service, slowing down for some simple reason, can cascade into a larger problem. And if your application can’t scale, the likelihood of individual components saturating and slowing down becomes a matter of when it will happen, not if if will happen. Lack of scalability turns into lack of availability. Lack of availability turns into failed customer expectations. Failed customer expectations turns into a negative impact on your business. Scaling is Critical Scaling is critical to your business success. Whether your business is growing or not, you need to be able to handle the growing and spiky traffic needs of your customers…at anytime…or risk application failure, upset customers, and a business failure. Scaling isn’t just important, it is a business necessity. Tech Tapas — Dynamic Cloud There are two ways that people utilize the cloud...the first is by taking an application that is designed to run anywhere, and run it in an infrastructure that was created in the cloud. This is typically called the static cloud, because you create resources, such as servers, that are long lived and use them to operate the application. The resource usage typically does not change much — or at all — over the long term as the application runs. The other way is to only allocate the resources you absolutely need, when you need them. Given that it is very easy to allocate and free resources in a cloud — especially a public cloud — it’s relatively easy to build an application that allocates the resources it requires when it requires them, and frees the resources when they are no longer required. This is called the dynamic cloud. The dynamic cloud is where the true power of cloud computing exists, and where the true benefits of using the cloud can be unlocked. The ability to only consume the resources you absolutely require at the moment, coupled with the ability to quickly allocate the additional resources you require as your application needs increase — gives you incredible capabilities in building highly scalable applications that can meet your needs no matter the amount of traffic sent to them, yet conserve money when traffic is low. When you perform a lift-n-shift migration of an application to the cloud, you typically move the application from a static data center to operating in a static cloud. You typically do not take advantage of the dynamic capabilities of cloud computing. Too often such application migrations end up being disappointments, because the application does not run any better in the cloud than it did in your own data center, yet the cloud resources may end up actually costing more money when used statically than equivalent resources in a static data center. The only way to truly see the advantage of using cloud computing is to utilize the dynamic cloud to build dynamic applications. Then you only consume — and pay for — the resources you require at the time you require them, yet you can increase the resources available to your application very quickly to handle sudden increases in traffic. Whether you are doing dynamic auto scaling, or using dynamic services such as Amazon DynamoDB, Google Big Table, AWS Lambda, or Azure Functions, using the cloud in a dynamic fashion — using the dynamic cloud — is the key to effectively utilizing the cloud to improve, and hence modernize, your application.…
Üdvözlünk a Player FM-nél!
A Player FM lejátszó az internetet böngészi a kiváló minőségű podcastok után, hogy ön élvezhesse azokat. Ez a legjobb podcast-alkalmazás, Androidon, iPhone-on és a weben is működik. Jelentkezzen be az feliratkozások szinkronizálásához az eszközök között.