The cloud is becoming increasingly popular in our day and age. It is the driving force of the modern world, and engineers are starting to shift their careers accordingly. Whatever your profession, chances are that you will have to work with the cloud in some way or another. Terms such as VPCs, subnets, security groups, ECS, and so on will no longer sound unfamiliar. But have we really grasped the gravity of this seismic shift?
First and foremost, until now most of us have been more concerned about Cost when we thought about the cloud. Cost savings was the primary motive during the first wave towards cloud adoption. Of course, the agility inbuilt in the cloud platform is what accelerated it. And hence you would notice many SaaS companies or consultancies that emerged during this first wave are focused on cost optimization. Every other tool which lets you identify idle cloud resources, which allows us to schedule cloud shutdowns, which allows us to fetch a spot instance with the lowest cost are prime examples of it. And that is understandable. But there is something more fundamental, and almost important aspect that we need to pay attention to, that is Security.
Security is often an afterthought for many of us. We are engineers who love to solve problems, design solutions and quickly deploy them to the cloud for that sense of achievement. And that is where is this critical security piece needs to thought of. With each new cloud service that makes a debut the whole cloud is becoming a dense place and we should better understand a few basic aspects of it to make the journey more fun, secure and navigable.
I come from a background whereas an engineer I wouldn’t have to bother about the security except for ports that we would use for communication among components, encryption/decryption of application data etc. Big guns of security — network security, application security, endpoint security and likes of it would be taken care of by experts. I would simply write my code, integrate it well to make it a whole working machine, and I’m done. I would just offload whatever happens to that application to the deployment teams. Deployment teams — which would have network engineers, Database admins, Operations engineers and an army of experts. Seldom I would interact with them. In a sense, I never bothered about security in a broader term.
But the cloud has changed everything — and we all now know about DevOps. It's a phrase that has been thrown around ever since 2009, when Patrick Debois coined the term to describe the growing movement of software delivery in both small startups and large enterprises. I wouldn't get into that. The point being all those security and deployment aspects I wouldn't bother with is now a part of my daily life.
Why is that happening now? Two main reasons for it: first, the cloud is an abstraction that works with programmable APIs. We use all these APIs to bake our applications. Second, the abstraction typically wraps all the physical aspects of the hardware and instead give us software constructs like IAM, Security Groups, Subnets etc. Many of us do not realize the implications of these two together: now we are the engineers who are controlling many physical aspects indirectly, and by APIs, the speed at which this happens has tremendously increased. In short, the work which experts used to do for us needs to be taken care of by us.
And that is what the Shared Responsibility model is all about. The most of security onus is on us. In fact, I’m sure many of you would have already encountered a Gartner warning somewhere :
Through 2025, 99% of cloud security failures will be the customer’s fault.
AWS is expanding its offering aggressively. At this moment there are 200+ services in offering. Few of them, such as EC2, RDS, S3, SQS, Lambda is used heavily, while some are used marginally. However, there is one service that you would have to deal with no matter what, and that is called IAM (Identity and Access Management). This service forms the backbone of AWS cloud and as you may already know it deals with Identity and Authorization. In essence, it is the fabric of AWS cloud, in fact almost every API call needs to pass through IAM check, resulting in massive 400M/sec requests throughput.
Understanding IAM Model
IAM is what a new network is, especially in the serverless world. Understanding IAM is crucial for any well-written application and its secure deployment in the cloud. In nutshell, IAM is about understanding what identities have which accesses and to what degree.
Identity could be anything, be it a user which is logged in a console session, an API using an STS token, compute like EC2 or even an abstracted compute such as lambda function. It is critical to understand that almost anything in the cloud can assume an identity. That means any cloud resource (think of cloud resource as any entity which can either provide compute, network or storage) can assume identity which would have wide-reaching consequences.
In AWS parlance identity would be called ‘Principal’. These principals can be thought of as actors in the cloud. Principals directly control what Actions are allowed.
Actions can b anything that can be initiated with API. Each cloud resource type would have its own set of actions that can be carried out. E.g. S3 have actions like Create/Delete/ListBucket. The action is the second piece of information that forms the IAM model.
IAM also controls onto which Resources these actions can be performed. Allowing actions on the resources would be too straightforward, and that is where Conditions come into the picture. Conditions check if any specific conditions such Principals, Tags etc are met before that action is allowed. E.g. S3 have conditions such as TagKeys, Prefix etc. The actions on any resources are always in Deny state, to begin with, and that ensures unless the user explicitly allows any action that specific action cannot be carried out. One need to Allow the actions on any resource.
When we put all these four terms together in one diagram the IAM model becomes easier to understand:
If you now look at the above diagram, it becomes pretty much clear that the cloud resources can be both Principals as well as Resource. That means, in practice, a Lambda function can execute another Lambda or any SQS can execute Lambda and so on.
If you pause for a moment and look at this, it would occur that now the resources can have direct and indirect relationships, some of which could be obvious and some not so obvious. These relationships can form a chain or a graph and may have disastrous consequences if not understood and dealt with properly. I will come to this important observation afterwards and cover it in detail.
IAM model helps us to understand at a conceptual level. What realizes this model in practice is IAM Policy. The policy is a document that combines all these four part. The IAM Policy can be attached to identities such as user, groups, roles or cloud resources, which then acquires the permissions (to perform specific actions within the constraints of conditions) listed in the IAM policy.
IAM Policy is like a key, which can change hands easily. Just be careful to whom you hand over keys directly or indirectly.
There are two major ways the policies differ.
Identity-based policies: A policy is in effect only when it is attached to an IAM entity (user, group or role). When a policy specifying a Resource is attached to a user, this user is the Principal (Actor) of the action. Let's take a look at a simple IAM policy like the one below:
In essence, this policy allows the attached entity to list all S3 buckets in the account. Thus if this policy is attached to the lambda (through service role), the lambda would be able to list all S3 buckets, or if the policy is attached to EC2, the EC2 would be able to list all S3 buckets.
Can there be scenarios with no identity? Absolutely! This is the case for anonymous access, or when an AWS service does not use a service role, such as an API Gateway, or when cross-account access is required to give as in the case of Lambda.
Now when there is no identity, identity-based policies can not be used and there comes into picture the other variation of IAM Policy — Resource-based Policies. The resource-based policy is used when you wish to give access to a resource to a Principal (actor). In this policy, we specify who has access to the resource and what actions they can perform on the resource. Resource bases policies are supported by a handful of AWS services only.
Take a look at below S3 bucket policy, which is used to give anonymous access to the bucket. In this policy, you would notice a new field Principal, and this particular policy ‘’ in Principal would mean everybody, allowing anonymous access to the S3 Objects. Thus in resource-based policies, resources can control who has access to them*.
Yet another very important class of policy is basically a resource-based policy and is known as Trust Policy. This allows services and identities to assume the role. For example, a cross-account access role can use the below trust policy to allow access from a different account. Many third-party SaaS services use these types of policies to access our AWS accounts.
This is the awesome power of IAM policy. And this is also the reason why one has to be absolutely aware of the content of the policy.
Any misconfiguration in IAM Policy can quickly escalate to be a security risk.
This brings us to the IAM best practices that should be always followed. Let's take a look at a few interesting observations regarding IAM policies.
- Least privilege (Grant least privilege): Each IAM should have the permissions which are absolutely required for the task at the hand, no less and no more.
While most of the time, we begin following this principle, over the period of time often temptation to reuse these IAM policies is hard to resist. It leads to fat IAM policies which also becomes vulnerable from a security perspective.
As a thumb rule, there should be one IAM policy per task. Resist the temptation to reuse policies.
Following one IAM Policy per task quickly leads to a good number of IAM policies in any sizeable cloud environment. If you ask anyone questions such as which IAM policy is mapped to what cloud resource if an IAM policy is mapped to multiple cloud resources, it becomes mostly guesswork (or a daunting task using AWS CLI) unless you’re using a tool to help you.
We would need a tool that analyzes all the information (IAM policy and the cloud resources) and prepare some kind of map which visualizes all these relationships. There are already a few interesting open source tooling in this area to identify and visualize the IAM relationships:
Another problem that would be faced with the least privilege principle is how to write a limiting access policy. Surprisingly it is hard to consistently write such policies at scale. For each such policy, you need to consult AWS documentation for actions specific to the cloud resource. Going through AWS documentation for referring to allowed actions quickly becomes cumbersome in the real world. An alternative is to use attribute-based access control policies.
Alternatively, you may use a wonderful tool, iamlive, which generates the IAM policy using client-side monitoring. Note that, you may need to put some effort to make it truly the least privileges.
- Be careful with cross-account IAM roles: Cross account roles opens a door to even the most secure cloud environment. This could be dangerous particularly when access is given to not so secure or unknown accounts and hence we need to be careful about them.
You should always know about cross-account IAM roles.
Unfortunately, this is no easy task to do without the help of any external tool. AWS has AWS Access Analyzer for this purpose do give it a try. Personally, I feel it helps to visualize the cross-account roles, to know if only legitimate AWS accounts are accessing our environment or not. The relationship between our cloud account and external cloud account can become clear and manageable with clear visualization.
Here is a list of Open source projects which can be useful either for writing secure policies or identifying over-provisioned fat IAM policies.
- github.com/Netflix/repokid — This tool can help to remove access to unused services. This comes from Netflix and that means it is a battle-tested tool. This tool uses another Netflix tool that needs to be deployed and in that sense, it is a little cumbersome to deploy and use. Read this excellent post for details.
Another set of tools that use CloudTrail for analyzing the IAM policies.
- github.com/flosell/trailscraper — a downside of this tool is false positives. Since it is a heuristic-based tool that attempts to map the CloudTrail events to IAM actions sometimes it generates false positives.
- github.com/duo-labs/cloudtracker — This tool uses Athena for queries and may cost you some $$$.
- github.com/salesforce/policy_sentry — This tool was recently released and can be used to write the least privileged policies.
- As engineers, we need to understand the IAM model and master the art of writing secure IAM policies.
- Cloud resources can have direct and indirect relationships and thus the IAM can have long-reaching consequences.
- Visualizing these complex relationships would be very important going forward.
- Writing IAM policies that adhere to best practices is a daunting task, and some tools can really come in handy to make life a bit easy.