Platform: Code4rena
Start Date: 26/08/2021
Pot Size: $100,000 USDC
Total HM: 8
Participants: 13
Period: 14 days
Judge: Albert Chon
Total Solo HM: 7
Id: 27
League: COSMOS
Rank: 1/13
Findings: 6
Award: $59,925.99
๐ Selected for report: 8
๐ Solo Findings: 5
๐ Selected for report: nascent
15658.7384 USDC - $15,658.74
nascent
Manual insertion of non-utf8 characters in a token name will break parsing of logs and will always result in the oracle getting in a loop of failing and early returning an error. The fix is non-trivial and likely requires significant redesign.
Note the c0
in the last argument of the call data (invalid UTF8).
It can be triggered with:
data memory bytes = hex"f7955637000000000000000000000000000000000000000000000000000000000000008000000000000000000000000000000000000000000000000000000000000000c000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000012000000000000000000000000000000000000000000000000000000000000000461746f6d0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000046e616d6500000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000673796d626fc00000000000000000000000000000000000000000000000000000"; gravity.call(data);
The log output is as follows:
ERC20DeployedEvent("atom", "name", โฎutf8 decode failedโฏ: 0x73796d626fc0, 18, 2)
Which hits this code path:
let symbol = String::from_utf8(input.data[index_start..index_end].to_vec()); trace!("Symbol {:?}", symbol); if symbol.is_err() { return Err(GravityError::InvalidEventLogError(format!( "{:?} is not valid utf8, probably incorrect parsing", symbol ))); }
And would cause an early return here:
let erc20_deploys = Erc20DeployedEvent::from_logs(&deploys)?;
Never updating last checked block and therefore, this will freeze the bridge by disallowing any attestations to take place. This is an extremely low cost way to bring down the network.
This is a hard one. Resyncing is permanently borked because on the Go side, there is seemingly no way to ever process the event nonce because protobufs do not handle non-utf8 strings. The validator would report they need event nonce N
from the orchestrator, but they can never parse the event N
. Seemingly, validators & orchestrators would have to know to ignore that specific event nonce. But it is a permissionless function, so it can be used to effectively permanently stop attestations & the bridge until a new Gravity.sol
is deployed.
One potential fix is to check in the solidity contract if the name contains valid utf8 strings for denom, symbol and name. This likely will be expensive though. Alternatively, you could require that validators sign ERC20 creation requests and perform checks before the transaction is sent.
#0 - jkilpatr
2021-09-10T19:19:27Z
This is a valid and well considered bug.
I do disagree about the difficulty of the fix though, if we fail to parse the token name as utf8 we can just encode the bytes themselves in hex and pass that along. The result will be perfectly valid if a little unergonomic.
#1 - albertchon
2021-09-23T14:27:00Z
Clever, great catch
#2 - loudoguno
2021-10-01T03:45:15Z
reopening as per judges assessment as "primary issue" on findings sheet
#3 - taitruong
2022-09-21T09:08:45Z
This is a valid and well considered bug.
I do disagree about the difficulty of the fix though, if we fail to parse the token name as utf8 we can just encode the bytes themselves in hex and pass that along. The result will be perfectly valid if a little unergonomic.
@jkilpatr, has this been fixed? Can't find an issue on Gravity repo. Just curious how this bug may gets solved.
#4 - jkilpatr
2022-09-21T11:32:04Z
Integration test that tests the fix every commit: https://github.com/Gravity-Bridge/Gravity-Bridge/blob/main/orchestrator/test_runner/src/invalid_events.rs Integration test output: https://github.com/Gravity-Bridge/Gravity-Bridge/actions/runs/3062208028/jobs/4943365412
Please let me know if you can think of any test cases not in that test.
๐ Selected for report: nascent
15658.7384 USDC - $15,658.74
nascent
Ethereum Oracles watch for events on the Gravity.sol
contract on the Ethereum blockchain. This is performed in the check_for_events
function, ran in the eth_oracle_main_loop
.
In this function, there is the following code snippet:
let erc20_deployed = web3 .check_for_events( starting_block.clone(), Some(latest_block.clone()), vec![gravity_contract_address], vec![ERC20_DEPLOYED_EVENT_SIG], ) .await;
This snippet leverages the web30
library to check for events from the starting_block
to the latest_block
. Inside the web30
library this nets out to calling:
pub async fn eth_get_logs(&self, new_filter: NewFilter) -> Result<Vec<Log>, Web3Error> { self.jsonrpc_client .request_method( "eth_getLogs", vec![new_filter], self.timeout, Some(10_000_000), ) .await }
The 10_000_000
specifies the maximum size of the return in bytes and returns an error if the return is larger:
let res: Response<R> = match res.json().limit(limit).await { Ok(val) => val, Err(e) => return Err(Web3Error::BadResponse(format!("Web3 Error {}", e))), };
This can be triggered at will and keep the loop in a perpetual state of returning the GravityError::EthereumRestError(Web3Error::BadResponse( "Failed to get logs!".to_string()))
error. To force the node into this state, you just have to deploy ERC20s generated by the public function in Gravity.sol
:
function deployERC20( string memory _cosmosDenom, string memory _name, string memory _symbol, uint8 _decimals ) public { // Deploy an ERC20 with entire supply granted to Gravity.sol CosmosERC20 erc20 = new CosmosERC20(address(this), _name, _symbol, _decimals); // Fire an event to let the Cosmos module know state_lastEventNonce = state_lastEventNonce.add(1); emit ERC20DeployedEvent( _cosmosDenom, address(erc20), _name, _symbol, _decimals, state_lastEventNonce ); }
And specify a large string as the denom, name, or symbol. If an attacker uses the denom as the attack vector, they save significant gas costing just 256 per additional 32 bytes. For other cases, to avoid gas overhead, you can have the string be mostly 0s resulting in just 584 gas per additional 32 bytes. This leaves it feasible to surpass the 10mb response data in the 6 block buffer. This would throw every ethereum oracle into a state of perpetual errors and all would fall out of sync with the ethereum blockchain. This would result in the batches, logic calls, deposits, ERC20 creations, and valset updates to never receive attestations from other validators because their ethereum oracles would be down; the bridge would be frozen and remain frozen until the bug is fixed due to get_last_checked_block
.
This will freeze the bridge by disallowing attestations to take place.
This requires a patch to reenable the bridge.
Handle the error more concretely and check if you got a byte limit error. If you did, chunk the search size into 2 and try again. Repeat as necessary, and combine the results.
Additionally, you could require that validators sign ERC20 creation requests.
#0 - jkilpatr
2021-09-10T19:17:29Z
Excellent bug report.
I just ran into the buffer limit issue this morning with an Ethereum block. I agree handling this error correctly is essential to long term reliability.
#1 - albertchon
2021-09-23T14:29:23Z
Nice :)
#2 - loudoguno
2021-10-01T03:45:25Z
reopening as per judges assessment as "primary issue" on findings sheet
๐ Selected for report: nascent
15658.7384 USDC - $15,658.74
nascent
In a similar vein to "Freeze The Bridge Via Large ERC20 Names/Symbols/Denoms", a sufficiently large validator set or sufficiently rapid validator update could cause both the eth_oracle_main_loop
and relayer_main_loop
to fall into a state of perpetual errors. In find_latest_valset
, we call:
let mut all_valset_events = web3 .check_for_events( end_search.clone(), Some(current_block.clone()), vec![gravity_contract_address], vec![VALSET_UPDATED_EVENT_SIG], ) .await?;
Which if the validator set is sufficiently large, or sufficiently rapidly updated, which continuous return an error if the logs in a 5000 (see: const BLOCKS_TO_SEARCH: u128 = 5_000u128;
) block range are in excess of 10mb. Cosmos hub says they will be pushing the number of validators up to 300 (currently 125). At 300, each log would produce 19328 bytes of data (4*32+64*300). Given this, there must be below 517 updates per 5000 block range otherwise the node will fall out of sync.
This will freeze the bridge by disallowing attestations to take place.
This requires a patch to reenable the bridge.
Handle the error more concretely and check if you got a byte limit error. If you did, chunk the search size into 2 and try again. Repeat as necessary, and combine the results.
#0 - jkilpatr
2021-09-10T14:57:19Z
This is a solid report with detailed computations to back it up. I appreciate it and will take actions in our web3 library to prevent this exact scenario.
#1 - loudoguno
2021-10-01T03:45:34Z
reopening as per judges assessment as "primary issue" on findings sheet
๐ Selected for report: nascent
4697.6215 USDC - $4,697.62
nascent
Severity: Medium Likelihood: High
In eth_oracle_main_loop
, get_last_checked_block
is called. Followed by:
let logic_call_executed_events = web3 .check_for_events( end_search.clone(), Some(current_block.clone()), vec![gravity_contract_address], vec![LOGIC_CALL_EVENT_SIG], ) .await;
and may hit the code path:
for event in logic_call_executed_events { match LogicCallExecutedEvent::from_log(&event) { Ok(call) => { trace!( "{} LogicCall event nonce {} last event nonce", call.event_nonce, last_event_nonce ); if upcast(call.event_nonce) == last_event_nonce && event.block_number.is_some() { return event.block_number.unwrap(); } } Err(e) => error!("Got ERC20Deployed event that we can't parse {}", e), } }
But will panic at from_log
here:
impl LogicCallExecutedEvent { pub fn from_log(_input: &Log) -> Result<LogicCallExecutedEvent, GravityError> { unimplemented!() } // snip... }
It can/will also be triggered here in check_for_events
:
let logic_calls = LogicCallExecutedEvent::from_logs(&logic_calls)?;
Attestations will be frozen until patched.
Implement the method.
#0 - jkilpatr
2021-09-10T18:57:32Z
Valid issue, but with zero probability. Since there is nothing on the module side that currently triggers arbitrary logic.
Despite the fact that it can't currently happen this is still a good report.
#1 - loudoguno
2021-10-01T03:45:59Z
reopening as per judges assessment as "primary issue" on findings sheet
๐ Selected for report: nascent
4697.6215 USDC - $4,697.62
nascent
"Large Validator Sets/Rapid Validator Set Updates May Freeze the Bridge or Relayer" can affect just the relayers & not affect the oracle in certain circumstances. This could result in valid attestations, but prevent any of the other relayers from being able to participate in the execution. While the other relayers are down from the other attack, the attacker can win all batch, logic, and valset rewards as their node is the only relayer running. This is possible because find_latest_valset
is run in the main relayer loop and everytime tries for 5000 blocks of logs.
#0 - jkilpatr
2021-09-10T19:02:12Z
This is a reasonable consequence of #6
I consider it medium risk because it reduces the number of relayers active, not because of the reward assignment
#1 - loudoguno
2021-10-01T03:46:56Z
reopening as per judges assessment as "primary issue" on findings sheet
422.7859 USDC - $422.79
nascent
Gas requirements of makeCheckpoint
: If the size of the validator set grows large enough during a time of block-size expansion, it may be possible to make the validator set large enough that, when the block size shrinks, the gas required to perform makeCheckpoint
may be larger than the amount of gas available in the block. In that case, the validator set could not be updated until the block size increased. If a reduction in upper gas limit for blocks occurs at the miner layer, it may be bricked permanently.
#0 - jkilpatr
2021-09-10T14:58:10Z
Duplicate of #30
#1 - loudoguno
2021-10-01T03:48:21Z
reopening as per judges assessment as "primary issue" on findings sheet
๐ Selected for report: nascent
1565.8738 USDC - $1,565.87
nascent
Tokens that prevent transfers to particular addresses (most commonly address(0)
as is the OpenZeppelin standard) enables DoS against a batch. If the attacker submits the bad transaction, the relayer wont submit the batch. The attacker never has to worry about the transaction being submitted and paying the fee because the transaction will fail, leaving the relayer stuck with the bill. This can enable MEV between chains by disabling others' ability to close arbitrage between chains by denying them their transfers off the chain.
#0 - jkilpatr
2021-09-10T19:05:19Z
The relayer will not actually pay the bill, since we simulate the tx before submission. That being said this is a valid way to block a batch for long enough that it times out.
I would describe this as low risk. Since it doesn't compromise the bridge or lose tokens, just potential value from arbitrage.
The correct solution here is to block invalid transactions from being added to batches on the Cosmos side. (which I just checked we do not block the zero address in MsgSendToEth)
#1 - loudoguno
2021-10-01T03:48:26Z
reopening as per judges assessment as "primary issue" on findings sheet
๐ Selected for report: nascent
1565.8738 USDC - $1,565.87
nascent
Severity: Medium Likelihood: Low
The function utils::downcast_uint256() -> Option<u64>
returns None
if the input value is greater than U64MAX
. If the value being downcast is read from a contract (e.g. a nonce), and the contract could be put into such a state where a Uint256
is set to higher value, this will cause all nodes to halt execution upon reading this value, requiring a patch to reenable the bridge.
Change the signature of downcast_uint256()
to return a Result<>
, and/or remove any unwrap()
s of the result.
#0 - jkilpatr
2021-09-10T16:14:52Z
This is valid, dealing with nonces as big-ints is something of a pain and it's reasonable to not expect these values to go over u64 max. I believe with nonce increase limitations as described in #32 this can be mitigated.
#1 - albertchon
2021-09-23T14:22:51Z
Low risk since this is very costly/impractical to make happen
#2 - loudoguno
2021-10-01T03:48:34Z
reopening as per judges assessment as "primary issue" on findings sheet