Understanding Claude API’s Usage Restrictions: Rate Limits Explained

Hello everyone. I’m AM from the Efficiency Improvement Team at Hirose Paper Mfg Co., Ltd.
Today, I’d like to share some insights on the rate limits you should be aware of when using the Claude API.
At our company, we’ve developed and are actively using a writing-assistance app powered primarily by the Claude API to help draft articles for the “Employee Blog” section of our website. This very article you’re reading now was created with help from Claude — while the author is responsible for the content, Claude helps shape the text into readable prose. Since the originality and core ideas must come from the writer, the final article’s quality ultimately depends on them. This isn’t about mass-producing AI-generated content, but even just having help with formatting the text has significantly reduced the burden on our authors.
While using this app, we unexpectedly ran into the issue of rate limits, which caused some disruption. I hope that by sharing our experience, I can help others who are planning to start similar development projects.
Table of Contents
Spend Limits vs. Rate Limits
The Claude API has two major types of usage restrictions: “Spend limits” and “Rate limits.”
Spend limits are relatively straightforward — users can freely set a monthly maximum usage budget, making it a handy feature for cost management.
Rate limits, on the other hand, are a bit more complex. These limits cannot be modified by users. While the specifics vary slightly depending on the language model in use, the most significant factor is the usage tier — referred to as “Tiers” — which range from Tier 1 to Tier 4, each with varying levels of restrictions.
Even though our tool is designed for internal use, dealing with rate limits turned out to be more troublesome than expected, so I’d like to focus on that aspect here.
Please note that the information in this article is current as of the time of writing. For the latest details, refer to the official documentation.
About the Tiers
The Claude API operates on a prepaid (deposit-based) system. Users deposit funds in advance, which are then deducted based on usage. The deposited amount determines the user’s tier, which comes in four levels.
To give you some concrete figures: Tier 1 starts at $5 or more, Tier 2 at $40 or more, Tier 3 at $200 or more, and Tier 4 at $400 or more. Since each tier has significantly different usage limits, selecting the appropriate tier based on your intended use is crucial.
For example, if you’re developing a lightweight tool for internal use, Tier 1 may seem like a suitable entry point. However, it’s important to be aware that the restrictions at this level can be unexpectedly strict.
Upgrading to a higher tier comes with conditions: you must maintain a deposit at or above the threshold for the target tier continuously for 1 to 2 weeks. If you anticipate higher usage, it’s wise to start with the appropriate tier from the beginning.
Rate Limits for Tier 1
Tier 1 restricts the number of requests to 50 per minute. Additionally, the maximum number of input tokens per minute ranges from 20,000 to 50,000, and the output tokens are capped at 4,000 to 10,000 per minute (these vary slightly depending on the language model).
Personally, I found the output token limit per minute particularly restrictive. In my experience, generating around 1,000 Japanese characters usually consumes this allowance in about two generations. If I try to generate a third passage, I often have to wait about a minute before proceeding (though this will vary depending on several factors).
When developing applications that require multiple generations in quick succession or real-time text generation, you’ll need to account for this wait time in your implementation.
On the other hand, upgrading to Tier 2 relaxes the limits significantly. The request limit increases to 1,000 per minute, input tokens to 40,000–100,000, and output tokens to 8,000–20,000.
When You Hit the Limits
If you exceed the rate limits, you’ll need to wait for about one minute before you can make another request. When this happens, the API returns a 429 status code, which you can use as a cue to handle the situation appropriately.
You can use information provided in the API response to deal with this. For example, the number of tokens used can be checked via `response.usage.input_tokens` for input and `response.usage.output_tokens` for output (the variable name “response” will differ depending on your implementation).
Monitoring these values helps you gauge how close you are to reaching the limits.
As for the wait time until the rate limit resets, it can be obtained from the `retry-after` value in the response header.
However, for our writing-assistance app, we found that we didn’t need to calculate this precisely. Instead, we implemented a fixed wait time of just over 60 seconds, which kept things simple and effective.
For reference, you can find more detailed information in Anthropic’s official documentation:
・Response codes and token usage
・Response headers
Conclusion
Rate limits in the Claude API can impact your development more than expected during the initial stages. Especially with Tier 1, the restrictions can be quite strict, so it’s important to consider them carefully when planning your app.
Although upgrading to a higher tier requires a larger deposit, simply increasing the amount is not enough. For example, to move up to Tier 2, you must maintain a balance of at least $40 for 7 days.
So, if your application is likely to use more than $40 in API credits, it’s a good idea to deposit that amount from the start—plus a few extra dollars to account for usage during the 7-day period.