AWS Config: Know before you take a plunge!

Knowing is half the battle won, a wise soul once said. History is testimony to it. Knowing nothing, or knowing only a little, is a recipe for disaster.

The phrase resonates not just in battles, but also in today's technical environments. Modern workloads (read cloud) are no lesser evil than the battlefields. With the dynamic changes happening every moment: the resources created and destroyed at the whim of some weird elasticity rules, configurations updated with each deployment, or production issues, and so on. To manage your workloads, you need up-to-date information on your workload. And not workloads, but the surrounding systems with which it may interact - the relationships.

Who knows AWS better than AWS itself. So when AWS brings its own native service, which promises the "knowing everything" and "protecting" AWS workloads, why one should not believe it?

AWS Config debuted in Nov, 214 with the promise of complete visibility in AWS Resources, with access to all the configuration history, and with the grand promise of being the configuration auditor of the AWS. After 7 long years when we take look back at the promise, one starts wondering why there are so many CSPM startups after all?

AWS Config is an AWS configuration auditor. A configuration auditor is especially useful in:

Cloud Asset Inventory Management
Cost Management
Change Management and
Security Management

The AWS Maze

AWS is a huge conglomerate of services. These services are an abstracted layer on top of basic compute, network and storage services. E.g. if you take a look at the AWS Lambda, behind the scenes it uses EC2 compute infrastructure, uses its network infrastructure for data flow, and isolates from other workloads based on VPC technology. When this service is made available to the user, it is made available through a defined set of APIs (which are wrapped under AWS SDKs for easy adoption and use), and the user can control some aspects of the service configurations. As an abstracted service view, AWS exposes certain entities to the user. These are commonly called Cloud Assets or Cloud Resource types.

In the case of AWS Lambda, a user needs to program code inside the Lambda Functions, while the reusable code would be composed in the form of Layers. Each such layer can be versioned as Layer Version. From the security point of view, you need to be absolutely sure that only your code runs as lambda, which is accomplished in part by code signing. This code signing is done by a different AWS service, AWS Signer. A user needs to provide the inputs for code signing information in the form of code signing configuration. The lambda can be invoked in response to an event. The source of these events, say SQSneeds to be configured. This configuration is collected as Event Source Mapping. If you're running the mission-critical workload in lambda, you may want to avoid the cold starts. This is done through provisioned concurrency config. And I'm not covering Function EventInvoke Configuration. Now step back a little and take a look at all those italic words in this paragraph. Those are cloud assets.

Service	Resource Type	AWS Service Relationships
Lambda	Lambda Function	AWS Signer
	Layer Version	SQS
	Code Signing Configuration	Kinesis
	Event Source Mapping
	Provisioned Concurrency Config
	Function EventInvoke Config

The cloud is a complex maze of cloud assets and the relationships between them. These are asset types and services that one has to deal with. And every one of them, if you really care about security, cost, and operations.

AWS Lambda Service and relationships

Ignorance is not bliss in the cloud. E.g., If you're blind to provisioned concurrency configuration, it is going to have an impact from a cost view - the function becomes ineligible for free tier and you would be paying for the reserved capacity. Not just that, the Provisioned Concurrency level counts to the function’s Reserved Concurrency limit and also to the account regional limits. It is common for such values to be tweaked and forgotten during incident resolution processes in production.

Traveling down a foggy road

The first tenet of any configuration auditor is visibility. Security follows the visibility.

The AWS Config, unfortunately, has a very poor record on this front. In Mar 2019, it supported 26 AWS services and 72 resource types. In 2021, AWS Config managed to increase its coverage by a whopping 43% to 103 resource types. It is quite an effort, isn't it?

Now look at the number of AWS services as of now in Jul 2021. We have approximately 230 AWS services. Terraform is a de facto IaC standard, which has a notion of resources corresponding to the configuration items. Going back to our example of AWS Lambda, this one-on-one correspondence can be clearly seen. Terraform supports ~630 resource types. Another IaC tool, AWS's own CloudFormation, has support of ~330 resource types. This translates the AWS Config coverage to < 30% when compared to CloudFormation and < 20% when compared to Terraform. Often, the complaint about CloudFormation is that it lags way behind Terraform.  

Even for the AWS services that AWS Config supports, many resources are not tracked of by AWS Config. You can see from the AWS Config page that, even common AWS services like Route53 and more are not supported by AWS Config. In fact, support for AWS’s oldest service SQS was added only in 2020.

AWS Config Coverage

The AWS resource inventory forms the foundation for cost, security and management, this lack of coverage in AWS Config is discouraging.

Nothing is as it should be

Another stated purpose of the configuration auditor is Change management. This is especially important when dealing with incidents. Consider, for example, rules in the load balancer, min/max for the auto-scaling group. These things are often changed manually during production issues, and if you don't have backups, it's quite easy to mis-configure and then spend time trying to recall previous values. This is where configuration change history shines.

Just imagine, it to be like a git version but applied to the cloud. Having an established baseline or golden snapshot is often a good idea. Obviously, IaC should be the first base for establishing the baseline, but reality is often different from the idealities. Even for the companies that have invested heavily in IaC, there are often resources which were created long back and haven't been included in IaC.

AWS Config has useful features like resource change timeline. If you need to view information across accounts and regions an aggregator needs to be created though. The expectation from AWS Config is to provide this versioning in a handy way.

AWS Config is a regional service, meaning you need to setup this service in all regions for all AWS accounts. When you have a good number of AWS accounts this quickly translates into a sizeable effort. This is an hindrance to adaptation. Often, AWS Config is not enabled in regions where you do not work. If AWS Config is not enabled for a region, it becomes a blind spot. You need to make sure to turn it ON every region (you work).

In short, AWS Config demands quite a lot efforts from its users while configuring the service for multi-accounts and regions. These are:

Enabling configuration recorder for each region.

Deciding to either cover all resource types or specific resource types.

For multi-account + multi-region create an aggregator.

When you decide to add a new region, remember to setup recorder in each region for each account

Update the aggregator to include new region.

AWS Config service's multi-step/per regions setup requires sizeable efforts further limiting the visibility.

And it costs nothing

AWS Config pricing model is volume based and have multiple parts, making it complex and indeterministic.

Each resource change recording costs $0.003. Mind that a recording is counted when any relationship changes or configuration changes. In highly dynamic environment, this can quickly become indeterministic. For example, ECS tasks when running in awsvpcnetworking mode attaches/detaches ENIs, does it count towards configuration record? Same applies to the Lambda when running in user's VPC. You need be careful of situations where accidental configuration change may trigger volume based actions.

Another cost contribution comes from the rule execution, which is again volume based. 100000 rule execution costs $0.001 per rule evaluation per region and so on. In addition, there is cost associated with S3 objects (snapshot and history) that would be saved in S3 bucket.

In short, remember AWS Config pricing is volume based. If you have a dynamic environment the cost can be indeterministic.

The volume not only impacts the cost it may also have an effect on how long AWS Config takes to detect configuration changes. AFAIK, there is no official documentation around this which gives a clear SLA between resource change/creation and Config detecting and evaluating it. Higher the activity more the delay.

So if you're considering AWS Config as configuration auditor, you should consider:

Limited visibility into cloud resources

Region based setups efforts

Usability

Indeterministic Pricing model