If you skipped the VPS chapter, just be aware that it has a lot of similarities due to the fact that in many cases your VPS may be on someone else’s hardware and under their control, just as many Cloud Service Providers leverage AWS resources, which ultimately still runs on real hardware. The Cloud is an abstraction (a lie).
Take the results from the higher level Asset Identification in the 30,000’ View chapter of Fascicle 0. Remove any that are not applicable. Add any newly discovered.
Using IaaS and even more so PaaS can provide great productivity gains, but everything comes at a cost. You don’t get productivity gains for free. You will be sacrificing something and usually that something is at least security. You no longer have control of your data.
Using cloud services can be a good thing especially for smaller businesses and start ups, but before the decision is made as to whether to use an external cloud provider or whether to use or create your own, there are some very important considerations to be made. We will discuss these in the Identify Risks and Countermeasures subsections.
If you are a start up, just be aware that the speed you have initially with a PaaS may not continue as your product moves from Proof of Concept to something that customers start to use if you decide to be more careful about customers and your own IP by bringing it in-house or entrusting it to a provider that takes security seriously rather than just saying they do. We will be investigating these options through the Identify Risks subsection.
In regards to control of our environments, we are blindly trusting huge amounts of IP to Cloud Service Providers (CSPs). In fact, I have worked for many customers that insist on putting everything in The Cloud without much thought. Some have even said that they are not concerned with security. The problem is, they do not understand what is at risk. They may wonder why their competitor beats them to market as their progress and plans are intercepted. The best book I have read to date that reveals the problem with this blind yielding of everything is Bruce Schneier Data and Goliath. This is an eye opening canon of what we are doing and what its results are going to be.
When ever you see that word “trust”, you are yielding control to the party you are trusting. When you trust an entity with your assets, you are giving them control. Are your assets their primary concern, or is it maximising their profits by using you and/or your data as their asset?
If you decide to use an external cloud provider, you need to be aware that what ever goes into The Cloud is almost completely out of your control, you may not be able to remove it once it is there, as you may not have visibility into whether or not the existing data is really removed from The Cloud.
If you deal with sensitive customer data, then you have an ethical and legal responsibility for it. If you are putting sensitive data in The Cloud then you could very well be being irresponsible with your responsibility. You may not even retain legal ownership of it.
We will keep these assets in mind as we work through the rest of this chapter.
Some of the thinking around the process we went through at the top level in the 30,000’ View chapter of Fascicle 0 may be worth revisiting.
The shared responsibility model is one that many have not grasped or understood well. Let’s look at the responsibilities of the parties.
The CSP takes care of the infrastructure, not the customer specific configuration of it, and Due to the shear scale of what they are building, are able to build in good security controls, in contrast to the average system administrator, which just does not have the resources or ability to focus on security to the same degree.
Due to the share scale, the average CSP has a concentrated group of good security professionals vs a business who’s core business is often not closely related to security. So CSPs do provide good security mechanisms, but the customer has to know and care enough to use them.
CSPs creating the infrastructural architecture, building the components, frameworks, hardware, platform software in most cases are taking security seriously and doing a reasonable job.
CSP customers are expected to take care of their own security in terms of:
but all to often the customers responsibility is neglected, which renders The Cloud no better for the customer in terms of security.
The primary problem with The Cloud is: Customers have the misconception that someone else is taking care of all their security. That is not how the shared responsibility model works though. Yes the CSP is probably taking care of the infrastructure security, but other forms of security such as I just listed above, are even more important than before the shift to The Cloud, this is because these items are now the lowest hanging fruit for the attacker.
The following are a set of questions (verbatim) I have been asked recently, and that I hear similar versions of frequently:
CSPs are constantly changing their terms and conditions, and many components and aspects of what they offer. I’ve compiled a set of must-answer questions to quiz your CSP with as part of your threat modelling before (or even after) you sign their service agreement.
Most of these questions were already part of my Cloud vs In-house talk at the Saturn Architects conference. I recommend using these as a basis for identifying risks that may be important for you to consider. Then you should be well armed to come up with countermeasures and think of additional risks.
Both authorised and unauthorised Users are more careful about the actions they take or do not take when they know that their actions are recorded and have the potential to be watched
You will almost certainly not have complete control over the data you entrust to your CSP, but they will also not assume responsibility over the data you entrust to them, or how it is accessed. One example of this might be, how do you preserve secrecy on data at rest? For example, are you using the most suitable KDF and adjusting the number of iterations applied each year (as discussed in the MembershipReboot subsection of the Web Applications chapter) to the secrets stored in your data stores? The data you hand over to your CSP is no more secure than we discuss in the Management of Application Secrets subsections of the Web Applications chapter and in many cases has the potential to be less secure for the following reasons at least:
How is your data encrypted in transit (as discussed in the Management of Application Secrets subsections of the Web Applications chapter? In reality, you have no idea what paths it will take once in your CSPs possession, and could very well be intercepted without your knowledge.
Hopefully you will have easy access to any and all logs, just like you would if it was your own network. That includes hosts, routing, firewall, and any other service logs
No CSP is going to last forever, termination or migration is inevitable, it is just a matter of when
As we discuss a little later in the Cloud Services Provider vs In-house subsection of Countermeasures, your data is governed by different people and jurisdictions depending on where it physically resides. CSPs have data centres in different countries and jurisdictions, each having different laws around data security
Who has access to view this data? What checks and controls are in place to make sure that this data can not be exfiltrated?
Make sure you are aware of what the uptime promises mean in terms of real time. Some CSPs will allow 99.95% uptime if you are running on a single availability zone, but closer to 100% if you run on multiple availability zones. Some CSPs do not have a SLA at all.
CSPs will often provide credits for the downtime, but these credits in many cases may not cover the losses you encounter during hot times
If the CSP can answer this with a “everything” and prove it, they have done a lot of work to make this possible, this shows a level of commitment to something security related. Just be aware, as with any certification, it is just that, it does not prove a lot
CSPs that allow penetration testing of their environments demonstrate that they embrace transparency and openness, if their networks stand up to penetration tests, then they obviously take security seriously also. Ideally this is what you are looking for. CSPs that do not permit penetration testing of their environments, are usually trying to hide the fact that either they know they have major insecurities, skill shortages in terms of security professionals, or are unaware of where their security stature lies, and not willing to have their faults demonstrated
This is another example if the programme is run well, that the CSP is open, transparent about their security faults and willing to mitigate them as soon as possible
A question that I hear frequently is: “What is more secure, building and maintaining your own cloud, or trusting a CSP to take care of security for you?”. That is a defective question, as discussed in the Shared Responsibility Model subsections. There are some aspects of security that the CSP has no knowledge of, and only you as the CSP customer can work security into those areas.
Going with a CSP means you are depending on their security professionals to design, build and maintain the infrastructure, frameworks, hardware and platforms. Usually the large CSPs will do a decent job of this. If you go with designing, building, and maintaining your own In-house cloud, then you will also be leveraging the skills of those that have created the cloud components you decide to use, but you will be responsible for the following along with many aspects of how these components fit together and interact with each other:
So in general, your engineers are going to have to be as good or better than those of the given CSP that you are comparing with in order to achieve similar levels of security at the infrastructure level.
Trust is an issue with The Cloud, you do not have control of your data or the people that create and administer the cloud environment you decide to use.
The smaller CSPs in many cases suffer from the same resourcing issues that many business’s do in regards to having solid security skills and engagement of their workers to apply security in-house. In general, in order to benefit from the Shared Responsibility Model of the CSP, it pays to go with the larger CSPs.
Most CSPs will have End User License Agreements (EULA) that have the right to change at any time, do you actually read when you sign up for a cloud service?
In many cases, hosting providers can be, and in many cases are forced by governing authorities to give up your and your customers secrets. This is a really bad place to be in and it is very common place now, you may not even know it has happened.
The NZ Herald covered a story in which Senior lawyers and the Privacy Commissioner have told the Herald of concerns about the practise which sees companies coerced into giving up information to the police. Instead of seeking legal order, police have asked companies to hand over information to assist with the “maintenance of the law”, threatened them with prosecution if they tell the person about whom they are interested and accept data with no record keeping to show how often requests are made. The request from police carries no legal force at all yet is regularly complied with.
As touched on in the CSP Evaluation questions, in many cases CSPs are outsourcing their outsourced services to several providers deep. They do not even have visibility themselves. Often the data is hosted in other jurisdictions. Control is lost.
This does not just apply to The Cloud vs In-house, it also applies to open technologies in The Cloud vs closed/proprietary offerings.
There is a certain reliance on vendor guarantees, these are not usually an issue though, the issue is usually us not understanding fully what our part to play in the shared responsibility model is.
What happens when you need to move from your current CSP? How much do you have invested in proprietary services such as serverless offerings? What would it cost your organisation to port to another CSPs environment? Are you getting so much benefit that it just does not matter? If you are thinking like this, then you could very well be missing many of the steps that you should be doing as your part of the shared responsibility model. We discuss these throughout this chapter. Serverless technologies really look great until you measure the costs of securing everything. Weigh up the costs and benefits.
There are plenty of single points of failure in The Cloud
There is a lot in common with the topic of cloud security in the other chapters of this fascicle and Fascicle 0, that if I had not already provided coverage, I would be doing so now.
Now would be a good time to orient / reorient yourself with the related topics / concepts from the other chapters. From here on in, I will be assuming you can apply the knowledge from the other chapters to the topic of cloud security without me having to revisit large sections of it, specifically:
Web Applications chapter
You might ask what people have to do with cloud security? A large amount of my experience working as a consulting Architect, Engineer, Security Pro for many organisations and their teams has shown me, that in the majority of security incidents, reviews, tests and redesigns, the root cause stems back to people defects, as recognised by the number one issue of the CSP Customer Responsibility of the Shared Responsibility Model. As people, we are our own worst enemies. We can be the weakest and also the strongest links in the security chain. The responsibility falls squarely in our own laps.
You will notice that most of the defects addressed in this chapter come down to people:
Todo Add in SER podcast from Zane Lackey when published.
With the shift to The Cloud, AppSec has become more important than it used to be, recognised and discussed:
The reason being, that in general, as discussed in the Shared Responsibility Model, the dedicated security resources, focus, awareness, engagement of our major CSPs are usually greater than most organisations have access to. This pushes the target areas for the attackers further up the tree. People followed by AppSec are now usually the lowest hanging fruit for the attackers.
The network between the components you decide to use in The Cloud will almost certainly no longer be administered by your network administrator(s), but rather by you as a Software Engineer. That is right, networks are now expressed as code, and because coding is part of your responsibility as a Software Engineer, the network will more than likely be left to you to design and code, so you better have a good understanding of Network Security.
The principle of Least Privilege is an essential aspect of defence in depth, stopping an attacker from progressing.
The attack and demise of Code Spaces, is a good example of what happens when least privilege is not kept on top of. An unauthorised attacker gained access to the Code Spaces AWS console and deleted everything attached to their account. Code Spaces was no more, they could not recover.
In most organisations I work for as an architect or engineer, I see many cases of violating the principle of least privilege. We discuss this principle in many places through this entire book series. It is a concept that needs to become part of your instincts. The principle of least privilege means that no actor should be given more privileges than is necessary to do their job.
Here are some examples of violating least privilege:
Hopefully you are getting the idea of what least privilege is, and subsequently how it breaks down in a cloud environment. Some examples:
The default on AWS EC2 instances is to have a single user (root). There is no audit trail with a bunch of developers all using the same login. When ever anything happens on any of these machine instances, it is always the fault of user
ubuntu on an Ubuntu AMI,
ec2-user on a RHEL AMI, or
centos on a Centos AMI. There are so many things wrong with this approach.
Sharing and even using unnecessarily the root user, as I discuss in the Credentials and Other Secrets subsections. In this case, the business owners lost their business.
As a Consultant / contract Architect, Engineer, I see a lot of mishandling of sensitive information. The following are some examples.
The following are some of the ways I see private keys mishandled.
SSH key-pair auth is no better than password auth if it is abused in the following way, in-fact it may even be worse. What I have seen some organisations do is store a single private key with no pass-phrase for all of their EC2 instances in their developer wiki. All or many developers have access to this, with the idea being that they just copy the key from the wiki to their local
~/.ssh/. There are a number of things wrong with this.
Most developers will also blindly accept what they think are the server key fingerprints without verifying them, thus opening themselves up to a MItM attack, as discussed in the VPS chapter under the SSH subsection. This very quickly moves from just a technical issue to a cultural one. People are trained to just accept that the server is who it says it is, the fact that they have to verify the fingerprint is essentially a step that gets in their way.
When Docker reads the instructions in the following
Dockerfile, an image is created that copies our certificate, private key, and any other secrets you have declared, and bakes them into an additional layer, forming the resulting image. Both
ADD will bake what ever you are copying or adding into an additional layer or delta, as discussed in the Consumption from Registries Docker subsection in the VPS chapter. Who ever can access this image from a public or less public registry now has access to your certificate and even worse your private key.
Anyone can see how these images were built using the likes of the following tools:
ENV command similarly bakes the
dirty little secret value as the
mySecret key into the image layer.
Sharing accounts, especially super-user accounts on the likes of machine instances and even worse, your CSP IAM account(s), and worse still, the account root user. I have worked for organisations that had only the single default AWS account root user you are given when you first sign up to AWS, shared amongst several teams of Developers and managers, on the organisations wiki, which in itself is a big security risk. Subsequently the organisation I am thinking about had one of the business owners go rogue, and change the single password and lock everyone else out.
Developers and others putting user-names and passwords in company wikis, source control, anywhere where there is a reasonably good chance that an unauthorised person will be able to view them with a little to moderate amount of persistence, as discussed above in the SSH section. When you have a team of Developers sharing passwords, the weakest link is usually very weak, and that is only if you are considering outsiders to be a risk, which according to the study I discussed in the Fortress Mentality subsection of the network chapter would be a mistake, with about half of the security incidents being carried out from inside of an organisation.
What ever you use to get work done in The Cloud programmatically, you are going to need to authenticate the process at some point. I see a lot of passwords in configuration files, stored in:
This is a major insecurity.
Serverless is not serverless, but the idea is that as a Software Engineer, you do not think about the physical machines that your code will run on. You can also focus on small pieces of functionality without understanding all of the interactions and relationships of the code you write.
There is a lot of implicit trust put in third party services that components of your serverless architecture consume.
Any perimeters that you used to, or at least thought you had are gone. We discussed this in the Fortress Mentality subsection of the Network chapter.
Azure has Functions.
The complexity alone with AWS causes a lot of Developers to just “get it working” if they are lucky, then push it to production. Of course this has the side effect that security is in most cases overlooked. With AWS Lambda, you need to first:
So… What is security when it comes to the Serverless paradigm?
What changes is the target areas for the attacker, they just move closer to application security, in order of most important first, we have:
Rich Jones demonstrated what can happen if you fail at the above three points in AWS in his talk “Gone in 60 Milliseconds”:
/tmp is writeable
The compute executing the functions you supply are short lived. With AWS, containers are used and reused providing your function runs at least once approximately every four minutes and thirty seconds according to Rich Jones talk. So the idea of hardware DoS is less likely, but billing DoS is a real issue.
AWS Lambda will by default allow any given function a concurrent execution of 1000 per region.
The only real glaringly obvious risks with the management of configuration and infrastructure - as code, is the management of secrets, and most of the other forms of information security. “Hu?” I hear you say. Let me try and unpack that statement.
When you create and configure infrastructure as code, you are essentially combining many technical aspects: machine instances (addressed in the VPS chapter), networking (addressed in the Network chapter), The Cloud obviously, and of course your applications (addressed in the Web Applications chapter), and baking them all into code to be executed. If you create security defects as part of the configuration or infrastructure, then lock them up in code, you will have the same consistent security defects each time that code is run. Hence, why Software Engineers now need to understand so much more than they used to about security. We are now responsible for so much more than we used to be.
Now we will focus on a collection of the largest providers.
The AWS section is intended as an overflow for items that have not been covered elsewhere in this chapter, but require some attention.
One of the resources I have found very useful to understand some of the risks along with auditing whether they exist currently, and countermeasures, including clear direction on how to apply them, is the CIS AWS Foundations document. This is well worth following along with as you read through this chapter.
AWS is continually announcing and releasing new products, features and configuration options. The attack surface just keeps expanding. AWS does an incredible job of providing security features and options for its customers, but… just as the AWS Shared Responsibility Model states, “security in the cloud is the responsibility of the customer”. AWS provide the security, you have to decide to use it and educate yourself on doing so. Obviously if you are reading this, you are already well down this path. If you fail to use and configure correctly what AWS has provided, your attackers will at the very minimum use your resources for evil, and you will foot the bill. Even more likely, they will attack and steal your business assets, and bring your organisation to its knees.
Password-less sudo. A low privileged user can operate with root privileges. This is essentially as bad as root logins.
Revisit the Countermeasures subsection of the first chapter of Fascicle 0.
As I briefly touch on in the CSP Account Single User Root subsection, Canarytokens are an excellent token you can drop anywhere on your infrastructure, and when an attacker opens one of these tokens, an email will be sent to a pre-defined email address with a specific message that you define. This provides early warning that someone unfamiliar with your infrastructure is running things that do not normally get run. There are quite a few different tokens available and new ones being added every so often. These tokens are very quick and also free to generate, and drop where ever you like on your infrastructure. Haroon Meer discusses these on the Network Security show I hosted for Software Engineering Radio near the end.
The following responsibilities are those that you need to have a good understanding of in order to establish a good level of security when operating in The Cloud.
There is not a lot you can do about this, just be aware of what you are buying into before you do so. AWS for example states: “Customers retain control of what security they choose to implement to protect their own content, platform, applications, systems and networks, no differently than they would for applications in an on-site datacenter.”
If you leverage The Cloud, Make sure the following aspects of security are all at an excellent level:
The following is in response to the set of frequently asked questions under the risks subsection of CSP Customer Responsibility:
(A): In the past, many aspects of network security were the responsibility of the Network Administrators, with the move to The Cloud, this has to large degree changed. The networks established (intentionally or not) between the components we are leveraging and creating in The Cloud are a result of Infrastructure and Configuration Management, often (and rightly so) expressed as code. Infrastructure as Code (IaC). As discussed in the Network Security subsection, this is now the responsibility of the Software Engineer
(A): TLS is one very small area of network security. Its implementation as HTTPS and the PKI model is effectively broken. If TLS is your only saviour, putting it bluntly, you are without hope. The Network Chapter covers the tip of the network security ice berg, network security is a huge topic, and one that has many books written along with other resources that provide more in-depth coverage than I can provide as part of a holistic view of security for Software Engineers. Software Engineers must come to grips with the fact that they need to implement defence in depth
(A): For this statement, please refer to the VPS chapter for your responsibilities as a Software Engineer in regards to “the machine”. In regards to “the network”, please refer to the Network Security subsection
(A): No, for application security, see the Web Applications chapter. For network security, see the Network chapter. Again, as Software Engineers, you are now responsible for all aspects of information security
(A): If you are still reading this, I’m pretty sure you know the answer, please share it with other Developers, Engineers as you receive the same questions
Once you have sprung the questions from the CSP Evaluaton subsection in the Identify Risks subsection on your service provider and received their answers, you will be in a good position to feed these into the following subsections.
On AWS you can enable CloudTrail to log all of your API calls, command line tools, SDKs, and Console interactions. This will provide a good amount of visibility around who has been accessing the AWS resources and Identities
Make sure you are completely clear on who is responsible for which data, where and when. It is not a matter of if your data will be stolen, but more a matter of when. Know your responsibilities. As discussed in the Web Applications chapter under the Data-store Compromise subsection… Data-store Compromise is one of the 6 top threats facing New Zealand, and these types of breaches are happening daily.
Also consider data security insurance
I have discussed in many places that we should be aiming to have all communications on any given network encrypted. This is usually not to onerous to establish on your own network, but may in some cases not be possible on a CSPs network, especially if you are using proprietary/serverless technologies. If you are using usual machine instances, then in most cases, the CSPs infrastructure is logically not really any different than an in-house network, in which case you can encrypt your own communications.
AWS also offers four different types of VPN connections to your VPC
If you do not have access to logs, then you are flying blind, you have no idea what is happening around you. How much does the CSP strip out of the logs before they allow you to view them? It is really important to weigh up what you will have visibility of, what you will not have visibility of, in order to work out where you may be vulnerable.
Can the CSP provide guarantees that those vulnerable areas are taken care of by them? Make sure you are comfortable with the amount of visibility you will and will not have up front, as unless you make sure blind spots are covered, then you could be unnecessarily opening yourself up to be attacked. Some of the CSPs log aggregators could be flaky for example.
With the likes of machine instances and network components, you should be taking the same responsibilities as you would if you were self hosting. I addressed these in the VPS and Network chapters under the Lack of Visibility subsections.
In terms of visibility into the Cloud infrastructure, most decent CSPs provide the tooling, you just need to use it.
As mentioned in point 1 above and Violations of Least Privilege countermeasures, AWS provides CloudTrail to log API calls, Management Console actions, SDKs, CLI tools, and other AWS services. As usual, AWS has good documentation around what sort of log events are captured, what form they take, and the plethora of services you can integrate with CloudTrail. As well as viewing and analysing account activity, you can define AWS Lambda functions to be run on the
s3:ObjectCreated:* event that is published by S3 when CloudTrail drops its logs in an S3 bucket.
AWS CloudWatch can be used to collect and track your resource and application metrics, CloudWatch can be used to react to collected events with the likes of Lambda functions to do your bidding.
There are also a collection of logging specific items that you should review in the Logging subsection of the CIS AWS Foundations document
Make sure you have an exit and/or migration strategy planned as part of entering into an agreement with your chosen CSP. Make sure you have as part of your contract with your chosen CSP:
Do not assume that your data in The Cloud in another country is governed by the same laws as it is in your country. Make sure you are aware of the laws that apply to your data, depending on where it is
Technically, anyone can. In the case of AWS, they will not purposely disclose your data to anyone, unless required to by law. There are a few things you need to consider here such as:
Count this cost before signing up to the CSP
AWS has a list of their compliance certificates
You will not need to go through this process of requesting permission from your own company to carry out penetration testing, and if you do, there should be a lot fewer restrictions in place.
AWS allow customers to submit requests to penetration test to and from some AWS EC2 and RDS instance types that you own. All other AWS services are not permitted to be tested or tested from.
GCP does not require penetration testers to contact them before beginning testing of their GCP hosted services, so long as they abide by the Acceptable Use Policy and the Terms of Service.
Heroku are happy for you to penetration test your applications running on their PaaS. If you are performing automated security scans, you will need to give them two business days notice before you begin testing.
Azure allows penetration testing of your applications and services running in Azure, you just need to fill out their form. In order to use Azure to perform penetration testing on other targets, you do not need permission providing you are not DDoS testing.
If the CSP is of a reasonable size and is not already running bug bounties, this is a good sign that security could be taken more seriously.
AWS has a bug bounty program.
GCP states that if a bug is found in the google infrastructure, the penetration tester is encouraged to submit it to their bug bounty program.
Heroku offer a bug bounty program.
Azure offer a bug bounty program.
It depends on the CSP, and many things about your organisation. Each CSP does things differently, has strengths and weaknesses in different areas of the shared responsibility model, has different specialities, is governed by different people and jurisdictions (USA vs Sweden for example), some are less security conscious than others. The largest factor in this question is your organisation. How security conscious and capable of implementing a secure cloud environment are your workers.
You can have a more secure cloud environment than any CSP if you decide to do so and have the necessary resources to build it. If you don’t decide to and/or don’t have the necessary resources, then most well known CSPs will probably be doing a better job than your organisation.
Then you need to consider what you are using the given CSPs services for. If you are creating and deploying applications, then your applications will be a weaker link in the security chain, this is a very common case and one that is often overlooked. To attempt to address application security, I wrote the Web Applications chapter.
Your attackers will attack your weakest area first, in most cases this is not your CSP, but your organisations people due to lack of knowledge, passion, engagement, or a combination of them. If you have a physical premises, this can often be an easy target also. Usually application security follows closely after people security. This is why I have the Physical and People chapters in Fascicle 0 of this book series, they are also the most commonly overlooked. The reason I added the Web Applications chapter last in this fascicle, was that I wanted to help you build a solid foundation of security in the other areas often overlooked before we addressed application security, and I also wanted it to be what sticks in your mind once you have read this fascicle.
Based on the threat modelling I hope you have done through each chapter, which was first introduced in Fascicle 0 you should be starting to work out where cloud security rates on your list of risks to your assets. By the end of this chapter, you should have an even better idea.
The fate of your and your customers data is in your hands. If you have the resources to provide the necessary security then you are better off with an In-house cloud, if not, the opposite is true.
If you go with an In-house cloud, you should have tighter control over the people creating and administering it, this is good if they have the necessary skills and experience, if not, then the opposite is true again.
You and any In-house cloud environment you establish is not subject to changing EULAs.
If you are using an In-house cloud and find yourself in a place where you have made it possible for your customers secrets to be read, and you are being forced by the authorities to give up secrets, you will know about it and be able to react appropriately, invoke your incident response team(s) and procedures.
If you use an In-house cloud, you decide where services & data reside.
You have to weigh up the vendor benefits and possible cost savings vs how hard / costly it is to move away from them when you need to.
Many projects are locked into technology decisions / offerings, libraries, services from the design stage, and are unable to swap these at a later stage without incurring significant cost. If the offering that was chosen is proprietary, then it makes it all the more difficult to swap if and when it makes sense to do so.
Some times it can cost more up front to go with an open (non proprietary) offering because somehow the proprietary offering has streamlined the development, deployment, maintainability process, that is the whole point of proprietary offerings right? Sometimes the open offering can actually be the cheaper option, due to proprietary offerings usually incurring an additional learning or upskilling cost for the teams/people involved.
Often technology choices are chosen because they are the “new shiny”, it is just what everyone else seems to be using, or there is a lot of buzz or noise around it.
An analogy: Do Software Developers write non-testable code because it is cheaper to write? Many/most code shops do, I discussed test driven development (TDD) in the Process and Practises chapter of Fascicle 0, I have blogged, spoken and run workshops on the topic of testability extensively. Writing non-testable code is a short sighted approach. Code is read and attempted to be modified and extended many times more than it is written up front.
If you are putting all your cost savings on the initial write, and failing to consider all the times that modification will be attempted, then you are missing huge cost savings. Taking an initial hit up front to write testable code, that is code that has the properties of maintainability, extensibility defined by the Liskov Substitution Principle will set you up so that the interface is not coupled to the implementation.
If you get your thought process right up front, and make sure you can swap components (implementation) out-in at will, maintainability and extensibility are not just possible, but a pleasure to do.
The following are some of the countermeasures to the single points of failure in The Cloud. The idea is to create redundancy on items that we can not do without:
As I mentioned in the Identify Risks Review Other Chapters subsection, please make sure you are familiar with the related concepts discussed.
Most of the countermeasures are discussed in the People chapter of Fascicle 0
Full coverage in the Web Applications chapter.
When you create IAM policies, grant only the permissions required to perform the task(s) necessary for the given users. If the user needs additional permissions, then they can be added, rather than adding everything up front and potentially having to remove again at some stage. Adding as required, rather than removing as required will cause much less friction technically and socially.
For example, in AWS:, you need to keep a close watch on which permissions are assigned to policies that your groups and roles have attached, and subsequently which groups and roles your users are in or part of.
The sequence of how the granting of least privilege looks in AWS is as follows, other CSPs will be similar:
Regularly review all of the IAM policies you are using, making sure only the required permissions (Services, Access Levels, and Resources) are available to the users and/or groups attached to the specific policies.
Enable Multi Factor Authentication (MFA) on the root user, and all IAM users with console access, especially privileged users at a minimum. AWS provides the ability to mandate that users use MFA, you can do this by creating a new managed policy based on the AWS DelegateManagementofMFA_policydocument template, attach the new policy to a group that you have created and add users that must use MFA to that group. As usual, AWS has documentation on the process.
The Access Advisor tab, which is visible on the IAM console details page for Users, Groups, Roles, or Policies after you select a list item, provides information about which services are accessible from any of your users, groups, or roles. This can be helpful for auditing permissions that should not be available to any of your users that are part of the group, role or policy you selected.
The IAM Policy Simulator which is accessible from the IAM console is also good for granular reporting on the permissions of your specific Users, Groups and Roles, filtered by service and actions.
AWS Trusted Advisor should be run periodically to check for security issues. Accessible from the Console, CLI and API. Trusted Advisor has a collection of core checks and recommendations which are free to use, such as security groups, specific ports unrestricted, IAM use, MFA on root user, EBS and RDS public snapshots.
AWS Config records IAM policies assigned to users, groups, or roles, and EC2 security groups, including port rules for any given time. Changes to your configuration settings can trigger Amazon Simple Notification Service (SNS) notifications, which you can have sent to those tasked with controlling changes to your configurations.
Your custom rules can be codified and thus source controlled. AWS calls this Compliance as Code. I discussed AWS CloudTrail briefly in item 1 of the CSP Evaluation countermeasures subsection. AWS Config is integrated with CloudTrail which captures all API calls from AWS Config console or API, SDKs, CLI tools, and other AWS services. The information collected by CloudTrail provides insight on what request was made, from which IP address, by who, and when
There are also a collection of IAM specific items that you should review in the Identity and Access Management subsection of the CIS AWS Foundations document.
As part of the VPS and container builds, there should be specific users created for specific jobs, every user within your organisation that needs VPS access should have their own user account on every VPS, including SSH access if this is required (ideally this should be automated). With Docker, I discussed how this is done in the Dockerfile.
Drive a least privilege policy around this, configuring a strong password policy for your users, and implement multi-factor authentication which will help with poor password selection of users. I discuss this in more depth in the Storage of Secrets subsection.
As I discuss in the Credentials and Other Secrets Countermeasures subsection of this chapter, create multiple accounts with least privileges required for each, the root user should hardly ever be used. Create groups and attach restricted policies to them, then add the specific users to them.
As I discussed in the Credentials and Other Secrets countermeasures subsection, there should be almost no reason to generate key(s) for the AWS Command Line Tools for the AWS account root user, but if you do, consider setting up notifications for when they are used. As usual, AWS has plenty of documentation on the topic.
Another idea is to set-up monitoring and notifications on activity of your AWS account root user. AWS documentation explains how to do this.
There are also a collection of monitoring specific items that you should review in the Monitoring subsection of the CIS AWS Foundations document.
Another great idea is to generate an AWS key Canarytoken from canarytokens.org, and put it somewhere more obvious than your real AWS key(s). When someone uses it, you will be automatically notified. I discussed these with Haroon Meer on the Software Engineering Radio Network Security podcast. Jay also wrote a blog post on the thinkst blog on how you can set this up and what the internal workings look like.
Also consider rotating your IAM access keys to your CSP services. AWS EC2 for example provide auto-expire, auto-renew access keys by using roles.
In this section I discuss some techniques to handle our sensitive information in a safer manner.
If you have “secrets” in source control or wikis, they are probably not secret. Remove them and change the secret (password, key, what ever it is). Github provides guidance on removing sensitive data from a repository.
Also consider using git-crypt.
Use different access keys for each service and application requiring them.
Rotate access keys.
The following are some techniques to better handle private keys.
There are many ways to harden SSH as we discussed in the SSH subsection in the VPS chapter. Usually the issue will lie with lack of knowledge, desire and a dysfunctional culture in the work place. You will need to address the people issues before looking at basic SSH hardening techniques.
Ideally SSH access should be reduced to a select few. Most of the work we do now by SSHing should be automated. If you have a look at all the commands in history on any of the VPSs, most of the commands are either deployment or manual monitoring which should all be automated.
When you create an AWS EC2 instance you can create a key pair using EC2 or you can provide your own, either way, to be able to log-in to your instance, you need to have provided EC2 with the public key of your key pair and specified it by name.
Every user should have their own key-pair, the private part should always be private, kept in the users local
~/.ssh/ directory (not the server) with permissions
600 or more restrictive, not shared on your developer wiki or anywhere else for that matter. The public part can be put on every server that the user needs access to. There is no excuse for every user not to have their own key pair, you can have up to five thousand key pairs per AWS region. AWS has clear directions on how to create additional users and provide SSH access with their own key pairs.
For generic confirming of the hosts SSH key fingerprint as you are prompted before establishing the SSH connection, follow the procedure I laid out for: Establishing your SSH Servers Key Fingerprint in the VPS chapter, and make it organisational policy. We should never blindly just accept key fingerprints. The key fingerprints should be stored in a relatively secure place, so that only trusted parties can modify them. What I would like to see happen, is that as part of the server creation process, the place (probably the wiki) that specifies the key fingerprints is automatically updated by something on the VPS that keeps watch of the key fingerprints. Something like Monit as discussed in the VPS chapter, would be capable of the monitoring and firing a script to do this.
To SSH to an EC2 instance, you will have to view the console output of the keys being generated. You can see this only for the first run of the instance when it is being created, this can be seen by first fetching:
Then to SSH to your EC2 instance: the command to use can be seen by fetching:
So how do we stop baking secrets into our Docker images?
The easiest way is to just not add secrets to the process of building your images. You can add them at run time in several ways. If you think back to the Namespaces Docker subsection in the VPS chapter, we used volumes. This allows us to keep the secrets entirely out of the image and only include in the container as mounted host directories. This is how we would do it, rather than adding those secrets to the
An even easier technique is to just add your adding of secrets to the
docker-compose.yml file, thus saving all that typing every time you want to run the container:
env_file we can hide our environment variables in the
Dockerfile would now look like the following, even our config is volume mounted and will no longer reside in our image:
Create multiple users with the least privileges required for each to do their job, discussed below.
Create and enforce password policies, discussed below.
Funnily enough, with the AWS account root user story I mentioned in the Risks subsection, I had created a report detailing this as one of the most critical issues that needed addressing several weeks before everyone but one person lost access.
If your business is in The Cloud, the account root user is one of your most valuable assets, do not share it with anyone, and only use it when essential.
Protecting against outsiders
The most effective alternative to storing user-names and passwords in an insecure manner is to use a group or team password manager. There are quite a few offerings available with all sorts of different attributes. The following are some of the points you will need to consider as part of your selection process:
The following were my personal top three, with No. 1 being my preference, based on research I performed for one of my customers recently. All the points above were considered for a collection of about ten team password managers that I reviewed:
Protecting against insiders as well
The above alone is not going to stop an account take over if you are sharing the likes of the AWS account root user email and password, even if it is in a group password manager. As AWS have already stated, only use the root user for what is absolutely essential (remember: least privilege), this is usually just to create an Administrators group to which you attach the
AdministratorAccess managed policy, then add any new IAM users to that group that require administrative access.
Once you have created IAM users within an Administrators group as mentioned above, these users should set-up groups to which you attach further restricted managed policies such as a group for
PowerUserAccess, a group for
ReadOnlyAccess, a group for
IAMFullAccess, progressively becoming more restrictive. Use the most restrictive group possible in order to achieve specific tasks, simply assigning users to the groups you have created.
Also use multi-factor authentication.
Your AWS users do not get created with access keys to use for programmatic access, do not create these unless you actually need them, and again consider least privilege, there should be almost no reason to create an access key for the root user.
Configure strong password policies for your users, make sure they are using personal password managers and know how to generate long complex passwords.
There are many places in software that require access to secrets, to communicate with services, APIs, datastores. configuration and infrastructure management systems have a problem of storing and accessing these secrets in a secure manner.
HashiCorp Vault. The most fully featured of these tools, has the following attributes/features:
/run/secrets/<secret_name> in Linux,
C:\ProgramData\Docker\secrets in Windows
Ansible is an Open Source configuration management tool, and has a simple secrets management feature.
AWS Key Management Service (KMS)
AWS has Parameter Store
Also see the additional resources for other similar tools.
Serverless is another form of separation of concerns / decoupling. Serverless is yet another attempt to coerce Software Developers into abiding by the Object Oriented (OO) SOLID principles, that the vast majority of Developers never quite understood. Serverless forces the microservice way of thinking.
Serverless mandates the reactive / event driven approach that insists that our code features stand alone without the tight coupling of many services that we often seem to have. Serverless forces us to split our database’s out from our business logic. Serverless goes a long way to forcing us to write testable code, and as I have said so many times, testable code is good code, code that is easy to maintain and extend, thus abiding by the Open/closed principle.
Serverless provides another step up in terms of abstraction, but at the same time allows you to focus on the code, which as a Developer, sounds great.
With AWS Lambda, you only pay when your code executes, as opposed to paying for machine instances, or with Heroku for the entire time your application is running on their compute, even if the application code is not executing. AWS Lambda and similar offerings allow granular costing, thus passing on cost savings due to many customers all using the same hardware.
AWS Lambda and similar offerings allow us to not think about machine/OS and language environment patching, compute resource capacity or scaling. You are now trusting your CSP to do these things. There are no maintenance windows or scheduled downtimes. Lambda is also currently free for up to one million requests per month, and does not expire after twelve months. This in itself is quite compelling to leverage the service.
When you consume third party services (APIs, functions, etc), you are in essence outsourcing what ever you send or receive from them. How is that service handling what you pass to it or receive from it? How do you know that the service is who you think it is, are you checking its TLS certificate? Is the data in transit encrypted? Just as I discuss below under Functions, you are sending and receiving from a potentially untrusted service. This all increases the attack surface.
Not really much different to the Fortress Mentality subsection discussed in the Network chapter.
With AWS Lambda, as well as getting your application security right, you also need to fully understand the Permissions Model, apply it, and protect your API gateway with a key.
In regards to help with consuming all the free and open source, review the Consuming Free and Open Source countermeasures subsection of the Web Applications chapter. Snyk has a Serverless offering also. Every function you add adds attack surface and all the risks that come with integrating with other services. Keep your inventory control tight with your functions and consumed dependencies, that is, know which packages you are consuming and which known defects they have, know how many and which functions are in production, as discussed in the Consuming Free and Open Source.
Test removing permissions and see if everything still works. If it does, your permissions were to open, reduce them
AWS Service Roles, grant the AWS Lambda service permissions to assume your role by choosing
Resource’s of the chosen policy.
AWSLambdaBasicExecuteRole if your Lambda function only needs to write logs to CloudWatch,
AWSLambdaKinesisExecutionRoleAWS if your Lambda function also needs to access Kinesis Streams actions,
AWSLambdaDynamoDBExecutionRole if your Lambda function needs to access DynamoDB streams actions along with CloudWatch, and
AWSLambdaVPCAccessExecutionRole if your Lambda function needs to access AWS EC2 actions along with CloudWatch
AWS Lambda allows you to throttle the concurrent execution count. AWS Lambda functions being invoked asynchronously can handle bursts for approximately 15-30 minutes. Essentially if the default is not right for you, then you need to define the policy, that is set reasonable limits. Make sure you do this!
Drive the creation of your functions the same way you would drive any other production quality code… with unit tests (TDD), that is in isolation. Follow that with integration testing of the function in a production like test environment with all the other components in place. You can mock, stub, pass spies in the AWS:
Set-up billing alerts.
Be careful not to create direct or indirect recursive function calls.
Use an application firewall as I discuss in the Web Application chapter under the “Insufficient Attack Protection” subsection may provide some protection if your rules are adequate.
Consider how important it is to scale compute to service requests. If it is more important to you to have a fixed price, knowing how much you are going to be charged each month, consider fixed price machine instances.
You should also be sending your logs to an aggregator and not in your execution time. What ever your function writes to stdout is captured by Lambda and sent to Cloudwatch Logs asynchronously, that means consumers of the function will not take a latency hit and you will not take a cost hit. Cloudwatch Logs can then be streamed to AWS Elasticsearch which may or may not be stable enough for you. Other than that, there are not that many good options on AWS yet, beside sending to Lambda which of course could also end up costing you compute and being another DoS vector.
The following are supposed to make the exercise of deploying your functions to the cloud easier:
The Serverless framework currently has the following provider APIs:
Claudia.JS: Specific to AWS and only covers Node.js. Authored by Gojko Adzic, which if you have been in the industry as a Software Engineer for long, this fact alone may be enough to sell it.
Zappa: Specific to AWS and only covers Python.
Storing infrastructure and configuration as code is an effective measure for many mundane tasks that people may still be doing that are prone to human error. This means we can sequence specific processes, debug them, source control them, and achieve repeatable processes that are far less likely to have security defects in them, providing those that are writing the automation are sufficiently skilled and knowledgeable on the security topics involved. This also has the positive side-effect of speeding processes up.
When an artefact is deployed, how do you know that it will perform the same in production that it did in development? That is what a staging environment is for. A staging environment will never be exactly the same as production unless your infrastructure is codified, this is another place where containers can help, Using containers, you can test the new software anywhere and it will run the same, providing its environment is the same and in the case of containers the environment is the image, and that is what you ship.
The container goes from the developers machine once tested, to the staging environment then to production. The staging environment in this case is less important than it used to be, and is just responsible for testing your infrastructure, which should all be built from source controlled infrastructure as code, so it is guaranteed repeatable.
docker-compose.yml file, orchestration platforms and tooling take us to “infrastructure as code”
Add password to the default user.
We have covered the people aspects along with exploitation techniques of Weak Password Strategies in the People chapter of Fascicle 0
We have covered the technical aspects of password strategies in the Review Password Strategies subsection of the VPS chapter
The risk is simply lack of knowledge, the speed of which the technological solutions are changing, and the fact that you must keep up with it.
There may not be any risks by simply asking the questions and analysing the responses. This is all part of helping you to build a better picture of where your current or prospect CSP is at on the security maturity model, thus helping you ascertain whether you should stick with them, select them as your CSP, and/or what your responsibility needs to be to achieve the security maturity you require.
In the Countermeasures section I provided a good number of points to consider, rather than outright solutions. These mostly depend on specifics of your organisation, which you will have to weigh up. There is no one answer for all here. Consider all options and make the decision based on what suites your organisation the best.
Refer to the “Risks that Solution Causes” subsection of the People chapter of Fascicle 0.
Refer to the Risks that Solution Causes subsection of the Web Applications chapter.
Refer to the Risks that Solution Causes subsection of the Network chapter.
Granting the minimum permissions required takes more work because you have to actually work out what is required.
AWS as many other CSPs provide many great tools to help us harden our configuration and infrastructure. If we decide not to take our part of the shared responsibility model seriously, then it is just time before we are compromised
This involves a good sense of smell to sniff out all the possible leaking secrets. This sense may need to be developed.
The biggest issue I see in these situations is company culture. This needs to be attacked from both bottom up and top down.
Taking away your Developers SSH access will not work unless the work that they would need SSH access for is automated, which of course is usually the best option anyway.
The Countermeasures just require some thought, establishing process and not a lot more.
It could be slightly inconvenient to maintain multiple users, rather than all users using a single user.
Password databases/managers can provide a huge improvement over not using them and resorting to storing secrets in insecure places such as unencrypted files, on post-it notes, and using the same or similar passwords across many accounts. If a password database is used correctly, all passwords will be unique and unmemorable.
One risk is that using a password database but not changing habits like the above, may improve your security every so slightly at best. You must change the way you think about passwords and other secrets you enter manually.
There are tools that can break password databases also. Understand how they do this and make sure you do not do what they require to succeed. The shorter the duration you have the password database running, the less likely an attack will be successful. Also configure the duration that a secret is in memory to the minimum possible.
In order for an application or service to access the secrets provided by one of these tools, it must also be able to authenticate itself, which means we have replaced the secrets in the configuration file with another secret to access that initial secret, thus making the whole strategy not that much more secure, unless you are relying on obscurity. This is commonly known as the secret zero problem.
Many of the gains that attract people to the serverless paradigm are imbalanced by the extra complexities required to understand in order to secure the integration of the components. There is a real danger that Developers fail to understand and implement all the security countermeasures required to get them to a similar security stand point that they enjoyed with having their components less distributed and running in long lived processes.
API keys are great, but not so great when they reside in untrusted territory, which in the case of the web, is any time your users need access to your API, so anyone permitted to become a user has permission to send requests to your API.
Do not depend on client side API keys for security, this is a very thin layer of defence. You can not protect API keys sent to a client over the internet. Yes, we have TLS, but that will not stop an end user masquerading as someone else.
Also consider anything you put in source control even if not public, already compromised. Your source control is only as strong as the weakest password of any given team member at best. You have also got build pipelines that are often leaking, along with other leaky mediums such as people.
AWS as the largest CSP is a primary target for attackers.
The risk with application security is that it needs to be learnt, and most Developers currently do not have a great understanding of it.
Some trial and error is probably necessary here, just make sure you err on the side of caution, else you could be up for some large billing stress, and this may not even be from an attacker, it could simply be due to your own coding or configuration mistakes as already mentioned.
These frameworks may lead the Developer to think that the framework does everything for them, it does not, so although using a framework will abstract some operations, it is another thing to learn.
Time is required to codify and automate your infrastructure and configuration. Creating a repeatable process that continues to add the same bugs is a risk.
Relying on tooling alone to provide visibility on errors and defects is a risk.
My hope is that I have provided enough visibility into your responsibility throughout this chapter that you will have a good understanding of what you need to do to keep your environments as secure as is required for your particular business model.
There is quite a bit to be done here, but it all depends on the answers you receive. I have provided scenarios and many points to consider in the Countermeasures section. You can evaluate these now if you have not already.
Refer to the “Costs and Trade-offs” subsection of the People chapter of Fascicle 0.
Refer to the Costs and Trade-offs subsection of the Web Applications chapter.
Refer to the Costs and Trade-offs subsection of the Network chapter.
It is worth investing the effort to make sure only the required user permissions are granted. As discussed, there are tools you can use to help speed this process up and make it more accurate.
You need to have these tools set-up so that they are continually auditing your infrastructure and notifying the person(s) responsible and/or that care about the issues, rather than having people continually manually reviewing settings, permissions and so forth
Your bastion host will be hardened as discussed throughout the VPS chapter. All authorised workers can VPN to the bastion host and SSH from there, or just SSH tunnel from where ever they are on the planet through the bastion host via port forward and to any given machine instances.
If you have Windows boxes you need to reach, you can tunnel RDP through your SSH tunnel as I have blogged about.
A second option with SSH (using the
-A option) is to, rather than tunneling, hop from the bastion host to your machine instances by forwarding the private key, which does provide the risk that someone could gain access to your forwarded SSH agent connection, thus being able to use your private key while you have an SSH connection established.
ssh-add -c can provide some protection with this.
If you do decide to use the
-A option, then you are essentially considering your bastion host as a trusted machine. I commented on the
-A option in the Tunneling SSH subsection of the VPS chapter. There is plenty of good documentation around setting up the bastion host in AWS. AWS provide some Best Practices for security on bastion hosts, and also discuss recording the SSH sessions that your users establish through a bastion host for auditing purposes
Credential and key theft are right up there with the most common attacks. This is not the place to skimp on costs.
Org charts, in difference, do not show how influence takes place in a business. In reality businesses do not function through the organizational hierarchy but through its hidden social networks.
People do not resist change or innovation, but they resist the insecurity created by change beyond their influence.
Have you heard the argument that “the quickest way to introduce a new approach is to mandate its use”?
A level of immediate compliance may be achieved, but the commitment will not necessarily be achieved (Fearless Change 2010).
If you want to bring change, the most effective way is from the bottom up. In saying that, bottom-up takes longer and is harder. Like anything. No pain, no gain. Or as my wife puts it… it is the difference between instant coffee and espresso.
Top-down change is imposed on people, and tries to make change occur quickly and deals with problems such as rejection and rebellion only if necessary. Bottom-up change triggered from a personal level focused on first obtaining trust, loyalty, respect from serving as in servant leadership, and the right to speak (have you served your time, done the hard yards)?
Because the personal relationship and involvement is not usually present with top-down, people will appear to be doing what you mandated, but secretly, still doing things the way they always have done.
The most effective way to bring change is on a local and personal level once you have built good relationships of trust.
By automating the work usually done manually by a Developer with SSH access, you are investing a little time in order to do a mundane job that once automated can be done many times without requiring the concentration and time of a human. This can end up being a huge cost saving as well as increasing your security.
No trade-offs required.
It may may cost you a little to set-up and maintain additional accounts. This is essential if you want to retain ownership of your CSP account and resources.
It costs little to do a little research on the most suitable team password database for you and to implement it. I provided a good selection of factors to consider when reviewing and making your selection in the Countermeasures subsection.
All of security is a deception. By embracing defence in depth, we make it harder to break into systems, which just means it takes longer and someone may have to think a little harder. There is no secure system. You decide how much it is worth investing to slow your attackers down. If your attacker is 100% determined and well resourced, they will own you eventually no matter what you do.
Securing Serverless currently is hard because the interactions between components are all in the open and on different platforms, requiring different sets of users and privileges. Trust boundaries are disparate.
At the same time, Serverless does push the Developer into creating more testable components.
If you are going to go Serverless, which I admit does have some very compelling reasons, make sure you invest heavily into implementing the Countermeasures I discussed.
My hope is that after consuming this book series, you will be in a much better place to apply application security, understand how the Permissions Model works and not only be able to but actually apply it.
Yes you need to spend some time here up front making sure you configure your duration and invocation counts conservatively. As well as setting alarms.
If you have time to learn another framework then go for it, this may take some trial and error to know whether the framework provides the right level of abstraction for you, while still providing enough break-out to obtain the low level control you may need.
You will need to weigh up the life of your project/product and the cost of codifying parts of your infrastructure and configuration. In most cases projects will benefit from some initial outlay in this, the pay off will usually be realised reasonably quickly.
Understand the underlying issues, errors and defects that the tooling options operate on, and do not depend solely on tools to inform you of problems.